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PROBLEM 


In a previous study Eisenberg? investigated the interpretations 
given by college students to the twenty-five best items in the Thurstone 
Neurotic Inventory. He found that: 


(1) The same questions are interpreted in different ways by different 
individuals. (2) Some people answer a question Yes for the same reason 
and with the same interpretation that others answer the same question No or ?. 
The average amount of overlap is forty-four per cent for all the questions. 
(3) Many responses to questions are equivocal when viewed in the light of 
their interpretation, since the reservations made and the reasons given render 
the question ambiguous. The average amount of equivocality is thirty-two 
per cent when ? responses are considered equivocal and twenty-two per cent 
when they are not considered equivocal. (4) Thirteen per cent of all responses 
are ? responses; this is viewed as a partial measure of the difficulty in inter- 
preting the question. (5) The three criteria, overlap, equivocality, and per 
cent ? responses are positively related to each other. 


Earlier research on the consistency of questionnaire responses 
reveals that retest response changes vary from fifteen per cent to 
thirty-five per cent depending upon the questions used, the subjects 
used, and the length of the interval between testings. In view of 
Eisenberg’s study it appeared that the reason for changes in response 
might be understood by examination of the individual’s interpretations 
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of the questions. Landis® has suggested that changes in response are 
due to changes in interpretation of the question. It would seem from 
this that unchanged responses are due to unchanged interpretations. 
Should this prove to be correct, questionnaires could easily be improved 
by decreasing the variability of interpretation. 

A concomitant problem is raised as to which factors might con- 
tribute towards the difference between good and poor items. The 
difference may be partially accounted for by the ambiguity of inter- 
pretation, by the nature of the response, or by the consistency of the 
response and interpretation. It is apparent that answers to these 
questions would aid considerably in the improvement of questionnaires. 


PROCEDURE 


1. The Questionnaire.—The questionnaire consisted of the following 
twenty-six items of the neurotic inventory type to be answered either 
Yes, No,or ?. All these items were taken from the Thurstone Neurotic 
Inventory. The good items were among those that Thurstone,® 
Willoughby, '° and Eisenberg? found to discriminate best between high- 
and low-scoring subjects. Willoughby also discovered that these 
items were most “cohesive,” (that is, were the most internally con- 
sistent by the use of factor analysis techniques); and Eisenberg indi- 
cated that these items were interpreted with the least amount of 
equivocality and overlap, and were responded to with the smallest 
percentages of ? response. The poor items were among those that 
Thurstone found to discriminate least between high- and low-scoring 
subjects. Only those items were used which gave a description of the 
individual’s personality; background and symptom questions were 
eliminated. 

Goop ITrEems 
1. Do you worry too long over humiliating experiences? 
3. Are you afraid of falling when you are on a high place? 
4. Are your feelings easily hurt? 
7. Are you happy and sad by turns without knowing why? 
10. Are you troubled with shyness? 
11. Do you daydream frequently? 


12. Do you get discouraged easily? 

15. Do you say things on the spur of the moment and then regret them? 

18. Does it bother you to have people watch you work even when you do it 
well? 

19. At a reception or tea do you avoid meeting the important person? 

22. Are you often lonely? 











24. 
25. 
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Are you self-conscious before superiors? 
Is it hard for you to make up your mind until the time for action is past? 


Poor IreMs 


2. 
5. 
6. 
8. 
g: 

13. 

14. 

16. 

17. 

20. 

21. 

23. 

26. 


Are there many people that you dislike intensely? 

Do you laugh infrequently? 

Do you feel that life is a great burden? 

Do you usually distrust people? 

Do people think you are selfish? 

Do you lose your head easily in a dangerous situation? 
Do you dislike to take on responsibilities? 

Do you often get bored with people you meet? 

Do you have a habit of leaving a lot of tasks unfinished? 
Does it upset you to lose in a competitive game? 

Do you get tired of amusements quickly? 

Do a great many things frighten you? 

Do you lose your temper quickly? 


- os 


-_ 


The numbers before the items indicate the order in which they were 


7 + 


presented. 


All questions are worded in such a way that the Yes response is 


considered the “‘neurotic.’’ 


The questionnaire was administered to one hundred seventy 


Brooklyn College students during the Summer of 1940. The instruc- 
tions were the usual: Answer either Yes or No to the question, and 
answer? only if certain that you can not answer either Yes or No to the 
question. 


After all subjects in the class had finished answering the questions, 


they were given the following instructions: 


Turn to the attached sheets and in a sentence or two indicate why you 


answered each question as you did. What did the question mean to you? 
For example, in a question such as “ Do you have difficulty in speaking before 
a& group?” you might answer either Yes or No or ? for a number of reasons. 
You might answer Yes because the group you think of is a large one and you 
have difficulty with large groups but not with small ones. You might answer 


No 


because you have no difficulty with small groups and think only of the small 


group and not the large ones with which you have difficulty. Or you might 
answer ? because you are not sure that a small group is meant. 


The same procedure was repeated on the retest. No explanation 


was given for the retest except to indicate that a full explanation would 
be given later; their coéperation was urged. 


The interval between tests was twenty-three days for sixty-two 


students and twenty-eight days for one hundred eight students. This 
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interval was chosen because Neprash* and Benton and Stone! have 
indicated that the percentage of changed response does not vary after a 
week or two. An interval greater than two weeks was used to insure 
against memory of previous responses and interpretations. 

2. The Subjects.—Of the one hundred seventy subjects, seventy-six 
were men and ninety-four women. Though their ages ranged from 
sixteen to forty, most of the ages clustered about the average of 19-11. 
Most of the students were sophomores, with a scattering of students in 
upper and lower classes. In general, the group was of a lower middle 
socio-economic status. 

3. Categorization of Interpretations.—In order to deal with the inter- 
pretations statistically it was necessary to put each interpretation into 
a category. This involves several difficult problems in methodology. 
The general guiding principle was to designate a category for a clear 
Yes interpretation, one for a clear No interpretation, and one for a 
clear ? interpretation. The other categories consisted of other inter- 
pretations which were not as clearly appropriate to a single response as 
were the first three and were, therefore, accepted as logical for more 
than one response. 

Using this technique, it was found that it was impossible to achieve 
the same number of categories for each question, without forcing inter- 
pretations unwarrantedly into categories which did not fit them. For 
most questions, however, six categories were obtained; seven items had 
five categories, seventeen had six, and two had seven categories. 

Below is given the categorization of interpretations to the question, 
“Do you feel that life is a great burden?”’ At the end of each category 
is indicated the response or responses which were accepted as being 
logical for that interpretation. A few illustrations of actual complete 
interpretations are given to indicate how the categorization was made. 


These examples are typical. 


1. Feels many burdens; more burdensome than the average. (Yes) 


Examples: ‘‘I feel life is a burden because my life has been a very difficult 


one. I was left an orphan at an early age.” 

‘“‘T don’t know whether I feel life is a great burden or whether 
I just think that life doesn’t really hold so much. You try to be happy 
because you tire of being gloomy. Personally, I’d like to flee from it 


all.” 


“T am always thinking of the unreached heights of human 
perfection and therefore life is a burden with very strenuous duties.”’ 
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2. Life is pretty good; zestful; no more than most people; makes the most of life; 
enjoys it. (No) 
Examples: “‘I don’t feel that life is a great burden—in fact I find joy in 
solving problems, or trying to solve them.” 
“‘No, even if I thought it was that wouldn’t remedy matters so 
I make the most I can and do get enjoyment and satisfaction from life.’ 
“T love life! I think life is what people make it. I try to make 
mine as abundant as possible.”’ 
“No, a challenge!” 
“T can’t complain now for my hardest years were during child- 
hood.” 


3. Sometimes, depends on the situation. (7?) 


Examples: ‘I cannot answer that question truthfully because I vary with 
the circumstances. There are times when I wish I was never born and 
feel completely discouraged and then there are times when I’m glad I’m 


alive.” 
“When I’m depressed, life is a burden. If I’m happy or unex- 


pected pleasant things happen I like life.”’ 


4. Have few or no burdensome situations. (No or ?) 


Examples: ‘‘I have never felt that life is a great burden since I have never 

had to earn my own living and I consider that one of the burdens of life.”’ 
“Generally inclined to avoid burdening situations.”’ 

“‘ Because I have always been very well sheltered by my parents, 

I have been more or less free from great responsibilities. Even when I 

am forced to carry responsibilities, I try my best to make them lighter.” 


5. Sees burdens in the future. (Yes or ?) 


Examples: “‘ Not until the present world crises have I ever thought of the 
futility, the misery, the stupidity of life. I am beginning to see beyond 
my own small sphere of life into a world where social and economic 
conditions do make life a burden for others. These factors have some- 
what dampened my ideal, (born from inexperience) that life is good, 


and amazingly interesting.” 
“T don’t feel any particular burden now, but I can see that there 


will be much more responsibility in the future.” 


6. Not life generally, but certain aspects. (No or ?) 


Examples: ‘‘I do not feel that life is a burden. At times I feel that life is 


difficult but never a burden.” 
“T don’t know if I’d call it a burden or unfathomable. I mean 


that I find no reason for existence.” 
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“‘T enjoy life to too great an extent to ever feel that it is a burden. 
Individual things may be burdensome, but not life as a whole.” 


To insure greater objectivity, the formulation of the categories was 
made independently by each of the authors after reading the interpreta- 
tions through. Comparisons of the two sets of categories then were 
made. In most cases the categories were the same. The final set 
used was the result of agreement reached between the two authors. 

The consignment of each interpretation to a category was also per- 
formed independently. This was done without any reference to the 
individual’s response; only the interpretations were read. Comparison 
of the two sets of categorizations indicated a high degree of agreement. 
Of 8840 items categorized, 8122 or 91.9 per cent were categorized in 
exactly the same way by both authors. For the remaining items a 
final categorization was reached in conferences between the authors. 

We feel that we have achieved a high degree of objectivity through 
the use of this procedure. It may be argued that errors were made in 
that some subjects may not have given complete interpretations, or 
were handicapped by improper verbalization, or did not respond freely, 
etc. Such criticism cannot be answered unless our results are checked 
with interviews. It is our impression, however, that most of the inter- 
pretations were clear, honest, and complete enough for the purpose of 
this study. 

RESULTS 


1. Definition of Terms 


Three terms require definition: Consistent response, consistent 
interpretation, and logicality of interpretation.* 

A response is consistent when the individual responds in exactly the 
same way in both testings. Thus, if he answers Yes the first time to a 
given question, he must answer Yes the second time. 

An interpretation is consistent when the individual’s interpretation 
of a given question is placed in the same category in both testings. 
This is independent of the individual’s response; the individual may 
change his response but his interpretation must remain the same. 





* A former study? employed a measure of overlap, that is, the extent to which 
individuals who respond differently to the same question give the same interpreta- 
tion. A measure of equivocality was also used, that is, the extent to which inter- 
pretations render responses ambiguous. These measures were not used in the 
present study since it was felt that they were not relevant to the problems inves- 
tigated. It may be indicated, however, that all the conclusions of the former study 


have been verified. 
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Interpretations are termed logical if they are in accord with the 
particular responses to which they led on both administrations. For 
example, in answer to the question, ‘‘ At a reception or tea do you avoid 
meeting the important person?” the interpretation would be considered 
logical if the individual answers Yes and says that he does “avoid the 
important person because he feels insufficient, shy,’’ etc., or if he says 
that he “feels superior to the person”’ or that “‘the entire situation is 
inane.”’ If he answers ?, then his interpretation would be considered 
logical if he says that avoiding the important person would “depend 
upon the circumstances” or that ‘‘he has never gone to receptions or 
teas.”” A logical No interpretation might be that he ‘‘ puts himself out 
to meet the important person”’ or that he ‘‘neither seeks nor avoids 
him.”’ 

The above illustration indicates a logical interpretation for a single 
response only; but as used in this study interpretations are considered 
logical only when they are logical on both administrations.* The 
reason for such a definition is that we were interested in studying 
change or lack of change of response in relation to the interpretation. 

The following illustrations of actual responses and complete inter- 
pretations of the question, ‘At a reception or tea do you avoid meet- 
ing the important person?” may serve to clarify the manner of the 
determination of the logicality of interpretations, and the material with 
which we were working. 


Consistent Responses with Logical Interpretations. 


1. Yes—‘‘Seem to fear that important person is so bored meeting so 
many. Why one more?” 

2. Yes—*‘‘ Always feel to meet him is pushing and produces a false relation.” 

1. No— “I rather enjoy it, if for no other reason than to see what the 
important person is like.”’ 

2. No— “I rather enjoy it.” 

1. No— “I don’t go looking for them purposely but I don’t look to avoid 


them.”’ 

2. No— “I do not go looking for them but if I meet them I do not avoid 
them.” 

1. ?— “TT have gone to very few receptions. I don’t know.” 

2. ?— “I have no such experience. My actions would depend on the 


others present. If most of the guests outshone me, I probably 
would make myself scarce. Otherwise I could be sociable.” 





* An interpretation which is not apparently logical need not be illogical since 
it is possible that the individual has not said enough to clarify his interpretation. 
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These interpretations are considered logical because in both 
responses the interpretations are in accord with the responses to which 
they led. 


Inconsistent Responses with Logical Interpretations. 


1. Yes—This brings to mind going through a lot of bother to meet a 
person I would never see again.” 

2. No— “Do you put off meeting the important person until too late? 
No, I meet him at the usual time.” 

1. Yes—‘‘I’m rather shy about meeting important people.” 

2. ?— “T’ve never had the opportunity to meet an important person 
but if I did I think I would have cold feet.” 

1. Yes—‘‘I get very self conscious and bashful with important people. 
If I talk to the guest of honor I feel that everybody is watching 
me and I don’t like that.” 

2. No— “I don’t go out of my way to meet him, but I usually feel it’s a 
duty to say ‘Hello’ and that’s all.”’ 


These interpretations are considered logical because, though the 
responses change, the interpretations change in the same manner so 
that the interpretations are logical in both administrations. 


Consistent Responses Whose Interpretations Are Not Apparently Logical. 


1. No—‘‘No. You go to a reception for the express purpose of meeting 
the important person.”’ 

2. No—‘I’ve only been to two or three teas and they were all for people 
I knew—such as members of the faculty or executives. This 
really should be a question mark for I don’t know how I would 
react at a tea for a stranger.” 

1. No—‘‘Like to meet people.” 

2. No—“‘ Depends upon how important the person is.”’ 

1. ?— “Meeting new important people makes me feel out of place. 
Self-consciousness again comes into the picture.” 

2. ?— “At times, when friends are near, and if I feel confident in what I 
am going to say, I’m anxious to meet important people. But on 
many occasions I’m afraid I’m going to say the wrong thing, etc.”’ 


These interpretations are not considered logical because on either 
one or both of the administrations the interpretations are not in accord 
with the response. The response remains the same, but usually in this 
group the interpretation is changed. 
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Inconsistent Responses Whose Interpretations Are Not Apparently Logical. 


1. ?— “TI like to meet important people.” 

2. No— “I always try to meet all the important people I can.” 
1. No— “Far from avoiding them I like to meet them.” 

2 


. ?— “TI don’t avoid them even though I may not have enough courage 
to approach them.” 
1. ?— ‘The most important person for me is the person I stay with all 
‘ evening.” 


2. No— ‘The most important person is the one I stay with all evening.” 

1. Yes—‘‘I do that to avoid the pomp and bother involved, but if the 
person is interesting I do not hesitate.” 

2. No— “If the person is obviously too crowded and if it would seem as if 
he would appreciate one person less to meet then I avoid him.” 

1. ?— “Sometimes I do and sometimes I don’t. If I’m without any 
special friends I will try to be as unobtrusive as possible but if 
I’m with a few of my own friends I’d probably be very forward.” 

2. Yes—‘‘If my friends are with me, I’d probably put on a very frivolous 
giddy manner, and if I knew the person slightly. But if I was 
alone I’d probably do my best to avoid him. By alone I mean 
with one companion.” 


These interpretations are not considered logical because on either 
one or both of the administrations the interpretations are not in accord 
with the response. The response changes, but usually in this group the 
interpretation remains the same. 


2. The Relation of Interpretation to Response 


(a) Degree of Consistency of Response.—Questions varied in con- 
sistency of response from 78.2 per cent to 95.3 per cent with an average 
consistency of 84.8 per cent for all twenty-six questions. This may be 
compared with the results obtained by Neprash*® who found 85.9 per 
cent consistency with the Thurstone Neurotic Inventory for a two week 
interval, and 84.5 per cent with four and eight week intervals. Lentz’ 
obtained 80.2 per cent consistency with the Bernreuter Inventory 
for a one- to four-week interval. Benton and Stone! found, with 
the Landis and Zubin Personal Inquiry Form, that with an immediate 
retest the percentage of consistency was 92 per cent and with intervals 
of four to twenty-one days it was 81 per cent. 

A comparison in Table II of the various criteria used reveals the 
following: 84.8 per cent of the responses were consistent, 75.6 per cent 
of the interpretations were logical, 71.4 per cent of the interpretations 
were consistent. All differences between these per cents are statis- 
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tically significant. It would appear then that the most stable charac- 
teristic of a question is the consistency of response. Whether this 
stability is desirable will be discussed after further analyses of the data. 

(b) Relation of Consistency of Response to Consistency of Interpreta- 
tion.—Reference to Table I indicates that consistent responses are 
more likely to be accompanied by consistent interpretations than are 
inconsistent responses. 78.5 per cent of the consistent responses have 
consistent interpretations, whereas only 32.0 per cent of the incon- 
sistent responses have consistent interpretations. Asis to be expected, 
the greater the distance of change in response, the greater the incon- 
sistency in interpretation. There is 38.1 per cent consistency in 
interpretation where the change in response is one step (Yes to ?, 
? to Yes, No to ?, and ? to No), but there is only 20.2 per cent con- 
sistency of interpretation where the change in response covers two 
steps (Yes to No, and No to Yes). (Critical ratio 5.24.) 


TABLE [.—RELATION BETWEEN CONSISTENCY OF RESPONSE AND CONSISTENCY 
AND LOGICALITY OF INTERPRETATION BY PER CENT 























Consistent — Total 
response seaponse response 
Consistent interpretation.................. 78.5 32.0 71.4 
Inconsistent interpretation................. 21.5 68.0 28.6 
Logical interpretation... .............see00. 80.1 50.9 75.6 
Illogical interpretation..................... 19.9 49.1 24.4 
Per Cent Logicality* 
Consistent interpretation.................. 91.3 re 85.5 
Inconsistent interpretation................. 37.2 72.4 49.9 











* The relationship between three variables is indicated by each of these figures. 
For example, 91.3 represents the per cent of consistent responses which have con- 
sistent interpretations that are also logical. 


(c) Relation of Consistency of Response to Logicality of Interpreta- 
tion.—The above analysis merely indicates that a consistent response 
is usually consistently interpreted but it does not indicate whether 
this interpretation is logical. This is a more crucial question. Inves- 
tigation reveals (Table I) that consistent responses are more frequently 
accompanied by logical interpretations (80.1 per cent) than are incon- 
sistent responses (50.9 per cent). In other words, four-fifths of the 
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consistent responses are interpreted in the expected fashion, as con- 
trasted with only one-half of the inconsistent responses. Inconsistent 
responses are therefore less reasonable. These results do not support 
Landis’ hypothesis that changes in response are due to concomitant 
changes in interpretation. The hypothesis explains only half of the 
changed responses; the other half of the changed responses remain 
unexplained. The corollary of the hypothesis, that unchanged 
responses are due to unchanged interpretations, is more true. Only 
one-fifth of the consistent responses remain unexplained by it. 

Table I reveals also that the clearest way to understand the relation- 
ship between response and interpretation is in terms of the logicality 
of response. As we should expect, a consistent interpretation is 
logical for a consistent response whereas it is not logical for an interpre- 
tation to remain the same while the response is changed. 


3. Comparison of Good and Poor Items 


Since half of our questions consisted of previously demonstrated 
“good” items, and the other half “poor” items, it was possible to 
compare the two groups in various respects to examine factors which 
might explain the difference. 


TaBLeE II.—CoMPARISON BETWEEN GOOD AND Poor ITEMs ror VARIOUS CRITERIA 





* 
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C Con- aad 
First testing Second testing ‘on- | sist- gr 
sist- pai cal 
ent |. inter- 
inter- 
re- ili preta- 
Yes | No ? | Yes| No ? |sponse “ tion 
tion 
Good items........../32.9 |58.0| 9.1 62.0 | 10.2} 83.0| 71.4 | 72.8 
De. gn cee 13.8 |75.4 | 10.8)12.1 |77.7 | 10.2) 86.6 | 71.5 | 78.4 
5.2 cheese out 23.3 |66.7 | 10.0 69.8 | 10.2) 84.8 1.4 | 75.6 
Critical ratios between 
Gres end poor Hema is 18.55 1.91)18.92)15.24) 0.00) 3.24) 0.00) 4.34 

















The clearest difference between the two groups of questions was 
found to be the incidence of neurotic response. (See Table II.) It 
will be recalled that Yes was in all cases considered the neurotic 
response. The good items had the greater incidence of neurotic 
response, and the smaller incidence of non-neurotic response. No 





pind Ae 








S|) la vod 


ae —'s 
i PR GPOIS rst. ot Mee v0. 





‘ 
3 
7 
¢ 
¥ 


ys 
i 
if 
3! 
nee] 
i 
4 


g/ dial 
ee. | 
i 
5. on 
Paes 
a; 
: 
iy 


i 
if 


332 The Journal of Educational Psychology 


differences in frequency of ? response were found. In the first testing 
32.9 per cent of the responses to the good items were Yes, whereas only 
13.8 per cent of those to poor items were Yes. The relationship is 
very much the same in the second testing, with but a slight reduction 
in the number of Yes responses. ' 

The greater incidence of non-neurotic responses in the poor items 
seems to account for all other differences that are found between good 
and poor items. There is a greater consistency of response in the 
poor items, 86.6 per cent, than in the good items, 83 per cent. This 
difference is statistically significant. There is greater logicality of 
interpretation in the poor items, 78.4 per cent, than in the good items, 
72.8 per cent. There is no difference between good and poor items in 
relative consistency of interpretation. These facts will be pertinent 
to the hypotheses proposed in the Discussion. 


4. Comparison of First and Second Testings 


Whether there is much difference in the individual’s response and 
interpretation at different test administrations is an important con- 
sideration. In this study, there appears to be a small difference which 
is statistically and theoretically significant. 

We find that 15.2 per cent of the responses and 28.6 per cent of the 
interpretations change. Reference to Table II reveals a decrease in 
Yes responses, from 23.3 per cent to 20 per cent, and an increase in 
No responses, from 66.7 per cent to 69.8 per cent. These, though 
slight, are statistically significant differences. There is virtually no 
change in the ? response, from 10 per cent to 10.2 per cent. 

These data confirm the results of Neprash® and Hertzman and 
Gould. They further suggest the pull of social norms in response. 
The No responses are very obviously more socially desirable and 
probably with the gaining of greater insight into the nature of the test, 
there is a tendency upon the part of the individual to answer with the 
more desirable responses. This hypothesis will be considered more 
fully in the Discussion. 


5. Comparison of Yes, No, and ? Responses 


Table III reveals the following differences in the Yes, No and ? 
responses: The No (non-neurotic) response has the greatest consistency 
of response, greatest consistency in interpretation, and the greatest 
amount of logicality of interpretation. The ? response has least of 
these characteristics, except that it differs very little from the Yes 








Consistency of Response and Logical Interpretation 


333 


TaBLeE III.—ComPaRISON BETWEEN YEs, No, and ? ResPoNsEs FOR VARIOUS 
CRITERIA 








Consistent response............ 
Consistent interpretation....... 
SR ces asectresesseeasal 





Yes 


No 


Critical ratios 





N* 


Per 
cent 


N* 


Per 
cent 


N* 


Y-N| Y-? | N-? 





957| 79.6 
957) 67.2 
957; 66.9 








3017| 9 
3017| 7 
3017| 7 








1.5)446) 50.2 
5. 1/446) 56.1 
9.8)/446) 66.8 











8.50/10. 89)17.07 
4.62) 3.96) 7.66 
53 





04) 5. 








* The total number of cases consists of the average of the two testings. 


response in extent of logicality of interpretation. 


inconsistent. 


These results agree 
with Lentz’ who found that 17.4 per cent of the Yes responses, 15.8 per 
cent of the No responses, and 68.2 per cent of the ? responses were 


TaBLE I[V.—RaANK DIFFERENCE CORRELATIONS BETWEEN ITEMS* 








Con- Con- Logical 
sistent revue inter- 
Y N ? inter- 
re- preta- 
sponse preta- | ‘tion 
tion 
gd gs SR Serre re yt —\,90|— .05) —.63 | —.22 | —.67 
eer eee re | ee —.12 .60 .23 .62 
OE Meiicss keep aa hred bea ee . 20) — .52}..... —.10 .19 | —.09 
Consistent response................}—.49]| .64)—.45) ..... . 36 . 28 
Consistent interpretation........... —.21) .26;—.08 X /|..... .05 
Logical interpretation.............. —.57| .56|—.25) X X 























* Correlations above the diagonal are for the first testing; below the diagonal, 
for the second testing. 


Table IV presents the same results in another way. This table 
presents the rank-difference correlations between items for the various 
criteria. With our data a correlation of .45 is four times its probable 
error—.l1. Accepting a correlation of $45 or above as significant when 


the correlation is consistent in both testings, the following relation- 
ships were found: The more frequently an item is responded to with 
a non-neurotic response, the more likely is it to be interpreted con- 
sistently and with a greater degree of logicality. 
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We have found that the poor items have the greater incidence of No 
response. We have found also that the No response is characterized 
by greater consistency in response, greater consistency in interpreta- 
tion and greater amount of logicality. Because the poor items have 
the greater incidence in No response, they therefore have greater con- 
sistency in response and greater consistency and logicality of interpre- 
tation. These relationships are important to the hypotheses in the 
Discussion. 


SUMMARY AND DISCUSSION 


1. Relation of Response to the Interpretation.—We have found the 
following: (1) 84.8 per cent of the responses were consistent, 71.4 per 
cent of the interpretations were consistent, and 75.6 per cent of the 
interpretations were logical. (2) Consistent responses are more 
likely to be accompanied by consistent interpretations than are incon- 
sistent responses. (3) Four-fifths of the unchanged responses have 
logical interpretations, but only one-half of the changed responses are 
interpreted logically. 

It appears from the above that consistent responses are usually 
interpreted consistently and logically. Since most (85 per cent) of 
the responses are consistent it would appear that questionnaires are 
reliable at least within the scope considered. However, it should be 
remembered that one-fourth of all the responses change without any 
‘‘logical”’ reason or remain consistent with as little logic. In addition, 
it will be found from the discussion below, that consistency in itself 
does not validate questionnaires. 

Our data do not support Landis’ hypothesis that the response is 
changed due to a change in the interpretation of the question. At 
most, only one-half the changed responses (50.9 per cent) could be 
accounted for by the hypothesis. We cannot offer any hypothesis, 
from our data, to explain why certain responses are consistent and 
why certain others are inconsistent. An attack on this problem 
can probably be advantageously made by means of the interview 
technique. 

2. Comparison of Good and Poor Questions.—The clearest difference 
between the items is that the good items are characterized by a greater 
incidence of neurotic response. All other differences, such as greater 
consistency of response and greater logicality of interpretation in the 
poor items, have been demonstrated to be a function of the lesser inci- 
dence of neurotic response. This conclusion is further supported by 
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the data on the differences between the first and second testing, and 
the characteristics of the No response as differentiated from the Yes. 

We are therefore faced with the paradoxical conclusion that the 
“best’’ items are those which are relatively more inconsistent and not 
apparently logical in interpretation. Neprash*® was faced with a simi- 
lar conclusion when he found that the most discriminating items in the 
Thurstone Neurotic Inventory were those in which the responses 
cHanged the most. He offered no explanation of this result, even 
though he found that items with low neurotic incidence are the most 
consistent in response. Hertzman and Gould® arrived at the same 
conclusions using the items which Garrett and Schneck had selected 
from the Woodworth Personal Data Sheet. 

Previous investigators have been faced with this problem either 
directly or indirectly. As a result, several hypotheses have been 
offered in explanation. 

Though Willoughby'® and Harvey‘ disagree on several other 
matters, they do agree that “the trait measured by the scale (neuroti- 
cism) exhibits, so far as it can be measured, more extreme and less 
frequent deviations in the maladjusted than in the adjusted direction, 
and that this is not, as might be hypothecated, a consequence of the 
downward limitations of the scores by zero, but a real result of the 
relative infrequency of the symptoms or their recognition.”” (!° p. 403.) 
Harvey however points out that if a psychoneurotic symptom occurs 
very infrequently “it cannot be a very characteristic neurotic symp- 
tom.”’ In other words, a question which has a low neurotic incidence 
cannot be a good question since it deals with too small a proportion 
of the population to be a good indicator of individual differences. 

A more sophisticated hypothesis is presented by Hertzman and 
Gould: 


Assuming that our items are reasonably validated as a whole, we find that 
those items answered most frequently in the neurotic direction are the ones for 
which the responses of our subjects are most frequently changed. This finding 
seems to be somewhat consistent with the neurotic symptom pattern which 
tends to variability. Those items which are most frequently aspects of the 
pattern are the ones most frequently interchanged and, therefore, the ones 
most frequently altered. We cannot eliminate the judgment aspect of the 
response as shown by changes even in “factual’’ questions; furthermore, the 
specific sets of an individual operating at different times may affect his 
judgments. Responses which in the general population rarely appear as 
symptoms will, therefore, rarely be changed. A factor making for change 
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might be the variable attitude of the subject towards the symptom, an attitude 
which would enable him to recognize it at one time and not at another. 
(5 p. 349.) 


In other words, Hertzman and Gould include Harvey’s hypothesis 
but also indicate that a good item is one which is characteristic of the 
neurotic pattern and consequently is also variable. In addition, they 
indicate that change in judgment may also affect the response. This 
last statement finds some support in our data. 

Frank* using the Bernreuter Inventory found that positive and 
negative questions, (those which are answered either Yes or No by 
75 per cent or more of the group) were more consistent than those 
questions which were neutral, (that is, answered Yes or No by less 
than 75 per cent of the group).* He hypothesizes as follows: 


The tentative conclusion is that agreement among subjects with respect to 
certain responses to a questionnaire is closely related to stability of response. 
If the positive and negative groups of items reflect behavior which is socially 
approved and disapproved, as these groups of questions appear to do, then 
the question arises as to what extént agreement of response and stability of 
response is a function of the crystallization of convention and social practice 
and what part knowledge and judgment of socially approved practices play 
in the motivation of questionnaire responses. (* p. 323.) 


In other words, poor items are those which are answered more in terms 
of socially approved practices than in terms of the individual’s true 
response. 

Our data do not invalidate any of these three hypotheses. Nor 
do they lend exclusive support to any one of them. However, none 
of these hypotheses appears to give a sufficiently comprehensive 
picture. 

It is questionable, though not impossible, that items with low 
neurotic incidence are so because these symptoms are infrequent in 
the population. It is more likely that individuals are less willing to 
admit possession of such symptoms. 

A careful examination of our good and poor items will reveal that 
the good items are so worded that many individuals may admit to 
possession of such characteristics without being socially scorned. 
Not many people, however, would be willing to admit that they rarely 





* Though it is not apparent from Frank’s report, it is probable that where the 
large majority responded with Yes or No to these questions that response is the 
non-neurotic response. In the Bernreuter Inventory the Yes response is neurotic 
for some questions and not for others. 
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laugh, that they dislike intensely many people, that they usually 
distrust people, or that they are not good sports when they lose in a 
competitive game. An examination of Lentz’s’ analysis of those items 
in the Bernreuter Inventory which change most frequently and those 
that are very stable in response will reveal the same characteristic 
differences. 

It is probable, however, that the variability in response is not only 
due to the wording of the question, but also to the subject’s attitude 
toward the symptom (if he possesses it) and to the social situation 
involved in the question. Assuming that the subject possesses the 
symptom, he will say Yes or No to the question depending on his 
recognition of the symptom, on his interpretation of the situation 
involved in the question, and on his willingness to admit possession 
of it even if he does recognize it. 

It appears that many factors account for the variability in response. 
One of these factors is the social approval or disapproval of the symp- 
tom. The larger the feeling of social disapproval toward a symptom 
the smaller will be the variability of response since few people will be 
willing to admit to possession of the symptoms even once. This 
apparently resolves the “paradox” that a discriminating question has 
greater variability in response than a non-discriminating question. 

But every item that has a large degree of variability is not neces- 
sarily a discriminating item. We have indicated that consistency of 
response, though desirable, is not necessarily related to the discriminat- 
ing value of a question. Moreover, a large degree of inconsistency of 
response is not desirable in a question either, since it leaves uncertain 
the measurement of the individual’s adjustment. 

For the resolution of this problem we should like to suggest two 
lines of research. (1) In the construction of psychoneurotic inven- 
tories, it is apparent that questions should be so worded that social 
approval plays a relatively minor réle. Direct analysis should be 
made of the desirability or the lack of desirability. of various forms of 
behavior for various groups. Probably the social value will differ 
with different groups. Neutral questions which do not have a pre- 
dominance of any response need to be selected. Once such neutral 
questions have been discovered, the next task will be to discover which 
wording of the question produces the greatest degree of consistency, 
logicality, etc. 

(2) The above research should advance our understanding of 
questionnaires, but until we understand the individual’s interpretation 
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of his response and the motivation prompting that response, the 
questionnaire will remain an unsatisfactory instrument for the meas- 
urement of psychoneurotic traits, useful though it is for other purposes. 
A technique should therefore be devised whereby both the individual’s 
responses and the attitude toward those responses can be obtained. 
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CLASSROOM BEHAVIOR PROBLEMS ENCOUNTERED 
IN ATTEMPTING TO TEACH ILLITERATE DEFECTIVE 
BOYS HOW TO READ! 


CHARLES L. VAUGHN 


University of Arizona 


, The main purpose of this investigation was to ascertain what 
disturbing classroom reactions might be attributed to the attempt to 
teach reading and related subjects to illiterate, defective, adolescent 
boys. The interest directing this research was not concerned with 
describing the longitudinal personalities of poor readers, but was 
centered upon discovering how they react to the academic situation in 
which reading and similar subjects are taught. This study was also 
designed as a simple test of the assumption made by Dollard and 
others? that aggression inevitably follows frustration. 

Although the results of various studies of the relationship between 
reading disability and problem behavior are not clear-cut, they do 
indicate that children with relatively normal intelligence who cannot 
read often present serious personality problems. Monroe and Backus* 
and Sherman‘ summarized a number of these investigations. The 
present inquiry, however, was concerned with children having inferior 
intelligence. They had been committed to an institution for the 
training and rehabilitation of higher level mentally defective boys and 
girls. 

PROCEDURES 


Two groups of boys were selected from the population of the Wayne 
County Training School. Group I consisted of twenty-eight boys who 
read below grade 3.0. Group II contained twenty-eight boys paired 
with Group I on the basis of chronological age, intelligence quotient, 
and consequently mental age, who read better than grade 3.0. The 
reading grade assigned to each case was determined by the average of 





1 Thanks are due the Wayne County Training School at Northville, Michigan, 
for use of their records, and Dr. Thorleif G. Hegge, who read portions of this 
manuscript. The study was initiated while the author was a member of the 
staff of that institution. 

? Dollard, John, et al.: Frustration and Aggression, 1939. 

* Monroe, Marion and Backus, Bertie: Remedial Reading, A Monograph in 
Character Education, 1937. 

‘Sherman, Mandel: “Emotional Disturbances and Reading Disability.” 
Recent Trends in Reading, 1939. 
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his score on the Gates Primary Reading Tests, Types 2 and 3 for 
children reading below grade 3.0, and by the New Stanford Reading 
Test for children reading above grade 3.0. Intelligence quotients were 
determined by the Stanford Revision of the Binet-Simon Intelligence 
Examination, and the mental age corrected to the date of the reading 
test, which was given not more than a year following the mental test. 
An attempt was made to select those cases for Group I who showed a 
significant reading-grade-mental-age-grade discrepancy, as computed ' 
by Monroe, in favor of mental age; the reverse was true for Group II. 
The children were then rated on the Haggerty-Olson-Wickman Behav- 
ior Rating Schedules by two groups of teachers. The subjects were 
rated in the academic rooms by their teachers in these classes, and 
in the handwork rooms by their teachers in these rooms. 

The essential differences between the two rooms are to be found 
in the subjects taught and in the amount of restriction on the child’s 
behavior. Approximately one-half of the school day is devoted to the 
academic subjects, the other half to manual and industrial arts and 
special subjects. In the academic rooms, which are ungraded, the 
children are taught the customary elementary subjects following a 
unit procedure with some emphasis upon drill. In the handwork 
rooms the children are taught rug-making, crocheting, woodwork, sheet 
metal work and other handicrafts. In the handwork rooms the chil- 
dren are not confined to the traditional schoolroom desks so much as 
they are in the academic rooms. 

Means and ranges of the two groups relative to the following five 
factors are described in Table I: Chronological age, Stanford-Binet 
IQ, mental age, reading grade, and the discrepancy between reading 
grade and the grade expected from the children’s mental age. From 
this table it is apparent that the two groups are equated on all factors 
except reading grade and the factor which is designated ‘discrepancy 
between reading grade and MA grade expectancy.”’ These, of course, 
are the two variables which are to be studied relative to problem 
behavior. The two groups are composed of younger adolescent, higher 
level meatally defective boys; but Group I has a reading grade average 
of 2.2 and a mental age expectancy 1.6 grades higher than this. Group 
II, on the other hand, has an average reading grade of 4.2, which is 0.3 
grades higher than the mental age expectancy. We shall call Group | , 
the “‘poor readers” and Group II the “‘good readers.””’ Both groups | 
started to school in the community and the individual members were 
admitted to the Training School at various times prior to the investiga- 
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tion. It can be stated with reasonable accuracy that both groups 
have been in school for seven years, Group I with little or no success in 
academic subjects, and Group II with appreciable success, the latter 
group certainly with as much success as might be expected from their 
mental age level. 


TaBLEeE I.—DeEscRIPTION OF SuBJECTS 




















Discrepancy 
Characteristics of two Chrono- Stanford- Mental Reading? between read- 
experimental groups logical Binet ile ad ing grade and 
(N = 28) age IQ s ~— MA grade 
expectancy® 
Group I. Children reading} Mean 13.8 64 8.8 2.2 1.6 
below grade 3.0. 
Range 12.7 to 50 to 7.1 to 1.4 to 0.2 to ‘f : | 
15.0 74 11.1 2.9 4.1 “ ay 
att 
Group II. Paired’ children| Mean | 13.9 64 8.9 4.2 —0.3 te oeM 
reading above grade 3.0. 
Range 12.3 to 51 to 7.3 to 3.0 to —2.3 to 
14.9 76 11.3 5.6 +1.3 























1 Paired on basis of IQ and CA. 

? For children reading above grade 3.0, this score is average of two reading grades on New 
Stanford Achievement Test; for children reading below grade 3.0 this score is average of two reading 
grades on Gates Primary Reading Tests, Types 2 and 3. 

* Computed by Monroe’s method, assuming that a mental age of six represents a reading grade 
expectancy of one; a mental age of seven, a reading grade of two, and so on. 


It is now possible to make the following three comparisons: We can 
compare the adjustment of the poor readers (Group I) in handwork 
rooms with their adjustment in academic rooms; it is also possible to 
compare the adjustment of poor readers with that of good readers 
(Group II) in the academic situation; and we can compare the adjust- 
ment of good readers in the academic\rooms with their adjustment 
in the handwork rooms. The three comparisons of adjustment were 
made in terms of ratings on the Haggerty-Olson-Wickman Behavior 
Rating Schedules. The poor readers have had a long history of 
failure, or frustration, in academic instruction; they have presumably 
met with more success at manual activities. The good readers, on the 
other hand, have experienced considerably more success, or less 
frustration, in scholastic activities than have the poor readers and are 
presumably as successful as the poor readers at handwork. 

Schedule A of the Haggerty-Olson-Wickman Behavior Rating 
Schedules lists fifteen behavior problems, and the teacher is asked to 
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check the frequency of occurrence of each problem for the child. 
There are four categories which he can check; v7z. ‘‘ has never occurred,” 
“‘has occurred once or twice but no more,” “occasional occurrence,” 
and ‘‘frequent occurrence.’”’ The authors of the scale have assigned 
various weights to these categories depending upon the frequency with 
which they were checked for any one problem in the standardizing 
population. For the purposes of the present study, however, these 
weights were disregarded. Weights of zero, one, two, three, and four 
were given to the four categories in order named above, and means 
for the two groups are the averages of these weights. For example, the 
score of a child showing no disinterest in school work would be zero for 
this problem; a child who shows frequent disinterest in school work 
would be given a score of four. Tardiness and truancy from classes 
are not special problems at the Training School since the children are 
taken to class together from their cottages, and these two categories 
are omitted, leaving thirteen problems for which the comparisons are 
made. 

Schedule B consists of a graphic rating scale for each of thirty-five 
intellectual, physical, social and emotional traits. Below the scale for 
each trait appear five descriptive phrases to assist the rater in making 
a quantitative judgment. The amount of each trait in Schedule B 
has been assigned a weighting in terms of its relationship to Schedule A. 
A weight of five indicates extreme maladjustment, a weight of zero 
normal adjustment. Means on each trait for the groups in this study 
were obtained, using the authors’ weights for each of thirty-four 
variables. High averages on Schedule B thus represent undesirable 
tendencies for the group; low averages, desirable tendencies. It 
should be emphasized that ratings assigned to the children reflect 
teachers’ attitudes as well as the child’s behavior. 

Differences between means and critical ratios were then computed 
for each item of the two schedules, and the comparisons indicated above 
were then made. The small number of cases made it necessary to use 
Fisher’s method for determining the standard errors of the means. 


The formula is:? 
ao, 











1 Since the children were paired on the basis of intelligence, item 1, “ Intelli- 
gence,”’ was omitted, leaving thirty-four items for the comparisons on Schedule B. 
* Guilford, J. P.: Psychometric Methods, p. 51. 











Classroom Behavior Problems 343 


The following formula was used for computing the standard errors of 
the differences between means: 





Cant, = V eux + o*yy — 2rayomxOuy 


For certain comparisons the correlational term would obviously 
vanish. Fisher’s “‘t’’ was obtained by dividing the difference between 
means by the standard error of the difference, and its significance was 
determined from Guilford’s tables of the significance of ‘‘t’s.’’! 


RESULTS AND INTERPRETATIONS 


The first question asked is: ‘‘Do poor readers cause more disturb- 
ances in academic rooms than they do in handwork rooms?” Table II 
answers this question. There are significantly more problems as 
checked on Schedule A in the academic rooms than there are in the 
handwork rooms. With twenty-seven cases a “t”’ of 2.052 is signifi- 
cant, a “‘t”’ of 2.771 is highly significant. Specifically, ‘‘disinterest in 
school work,” “cheating,” ‘‘defiance to discipline,’ “‘marked over- 
activity,’’ ‘temper outbursts,” and “‘speech difficulties” are problems 
encountered more frequently when these children who read very poorly 
are taught reading and related subjects than when they are taught 
subjects entai'ing predominantly manual effort. 

The next question is: ‘‘ Do poor readers cause more problems in the 
academic rooms than do good readers?’”’ The answer to this query is 
in the affirmative, as evidenced by the differences in Table II, although 
the differences are not as reliable as those between academic and hand- 
work for the poor readers. Problems more pronounced among the poor 
readers than among the good readers are “disinterest in school work,” 
“marked overactivity,” ‘‘temper outbursts,’ and “‘speech difficulties.” 
Whereas on the problems ‘‘stealing”’ and “obscene notes, talks, 
pictures, etc.” the poor readers seemed to be better adjusted in the 
academic rooms than they were in the handwork rooms, the poor 
readers are no better adjusted in any category in the academic rooms 
than are the good readers. 

A third comparison is important. It may be that all defectives 
are better adjusted to handwork than they are to academic activities. 
If this is the case, then the fact that poor readers are better adjusted 
to handwork rooms than they are to academic rooms may reflect only 
this general situation, and not be due to their specific retardation in 





' Tbid., pp. 548-552. 
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reading and related subjects. To test this possibility, a comparison 
was made of the adjustment of good readers in the two situations. 
Surprisingly, it was found that the children who did well in reading 
were if anything slightly better adjusted to the academic rooms than 
they were to the handwork rooms. In other words, this finding 
indicates that a defective child achieving his maximum scholastically 
will be as well adjusted to academic work as he will be to manual work, 
and that he will meet with no more frustration in one line of effort than 
in the other. This result does not mean, of course, that the defective 
child is no more of a problem in school than the normal child, for he 
probably is. 

Tas_e III.—Prositem Benaviorn RELATED AND Not RELATED TO THE ATTEMPT 


Tro TEAcH READING AND RELATED Sussects TO ADOLESCENT ILLITERATE 
DerectTivE Borys—ScHEepDuLE A! 


ProsLem Besavior RELATED ProsLem Benavior Not Revatep 
To RETARDATION! TO RETARDATION 
Disinterest in school work* Cheating 
Defiance to discipline Lying 
Marked overactivity * Unpopular with children 
Temper outbursts* Imaginative lying 
Bullying Sex offenses 
Speech difficulties * Stealing 
Total of behavior problems* Obscene notes, talks, etc. 


*’d items are those in which “?t’s’”’ greater than 2.0. 

1 Criterion of relation between retardation in reading and occurrence of problem 
behavior: The following three conditions must be fulfilled concurrently: (a) 
Problem behavior more frequent among poor readers (group I) in academic than 
in handwork rooms; (b) problem behavior of good readers (group II) no more 
frequent in academic than in handwork rooms; (c) problem behavior more frequent 
among poor readers in academic rooms than among good readers in academic 
rooms. (See Table II.) A “t’’ of 1.35 or more tentatively considered significant 
for purposes of study. 


In order to isolate those problems which might be attributed 
specifically to the attempt to teach academic subjects to the boys who 
read poorly, a study was made of the problems checked more often for 
them in the academic rooms than in the handwork rooms, more often 
for them in the academic rooms than for the good readers in these 
rooms, and no more frequently for the good readers in academic than 
in handwork rooms. This part of the study is a summary of the 
previous comparisons. Table III shows the problems related and not 
related by all of these criteria to the attempt to teach reading and 
related subjects to these boys. Since three conditions were fulfilled 
simultaneously in this aspect of the study, smaller “‘t’s’’ were tenta- 
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tively assumed to be significant. No formula was available for deter- 
mining the actual probability that these differences would occur by 
chance. 

The following problems on Schedule A stand out when the three 
criteria are imposed: ‘ Disinterest in school work,” ‘defiance to 
discipline,” ‘marked overactivity,” “temper outbursts,” “bullying,” 
“speech difficulties,” and “total of behavior problems” on Schedule 
A. If a “t” of 2.0 or more is required for significance, the following 
problems are still more frequent: ‘‘Disinterest in school work,” 
“marked overactivity,”’ ‘‘temper outbursts,” ‘‘speech difficulties,” and 
“total of behavior problems.” ‘‘Cheating,” “lying,” “unpopularity 
with other children,” ‘‘imaginative lying,” “‘sex offenses,” “‘stealing”’ 
and “obscene notes, talk and pictures” do not seem to be related to 
retardation in reading. 

The classroom problems related to retardation in reading can be 
conceived of as forms of aggression. Temper outbursts, defiance to 
discipline, bullying and marked overactivity in particular are immedi- 
ately recognized forms of aggression. Disinterest in school work may 
not be an obvious form of aggression, but undoubtedly any child 
markedly overactive, defying discipline, showing temper outbursts, or 
evidencing most of the other problems would be considered uninter- 
ested in school work. Although defiance to discipline and bullying 
are not outstanding problems among poor readers in the academic 
situation, they seem to be more conspicuous and are certainly forms of 
aggression, in one case against the teacher and in the other against the 
other children. From these findings it would appear that the object 
of frustration is not only the reading itself but the teacher and the other 
children in the classroom. 

It is curious that speech difficulties are more pronounced among 
the non-readers in academic rooms than they are in the handwork 
rooms. This fact emphasizes that problem behavior is not an inde- 
pendent variable but is a function of the teacher’s perception of the 
behavior. That is, in the reading situation children are called on to 
talk more than they are in manual training, and speech problems, 
therefore, become more noticeable. It is a common finding that speech 
cases more often show reading handicaps than children who talk 
normally. The cause for this is not clear, but it is probably true that 
a speech impediment is just as frustrating as a reading handicap to the 
child in the situation in which he has to talk more, and the aggressive 
reactions found in these children are undoubtedly due in part to talking 
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with a speech handicap. It would be interesting to speculate about 
the extent to which a reading handicap with consequent failure and 
frustration would be conducive to speech problems. 

Although Schedule B of the Haggerty-Olson-Wickman Scales con- 
tains several items almost synonymous with reading failure, it was 
thought advisable to make the same comparisons as with Schedule A. 
Of the four major divisions in Schedule B, only on Division 1, “ Intel- 
lectual Traits,’’ do poor readers show better adjustment in handwork 
than they do in academic work; this difference is barely reliable. Poor 
readers are also more poorly adjusted than good readers only for this 
division in academic rooms; this difference, however, is quite reliable. 

Of more importance for this study are those traits evidenced by 
poor readers in the reading situation compared with the handwork 
situation, and by poor readers compared with good readers in the 
academic situation. These are the traits most clearly related to 
retardation in reading. This was the last comparison for Schedule A. 
The criterion for this comparison is: Poor readers better adjusted in 
handwork than in academic work, good readers better adjusted than 
poor readers in academic rooms, and good readers no better adjusted 
in handwork than in academic work. Table IV shows the traits 
related to the attempt to teach reading and similar subjects to these chil- 
dren, that is, the traits which fulfill the three conditions simultaneously. 


Taste 1V.—Traits CHECKED TO INDICATE GREATER MALADJUSTMENT OF PooR 
READERS IN AcAaDEMIC Rooms—ScHeEpDuLE B! 

Abstracted or wide awake 
Attention sustained? 
Mentally lazy or active 
Total*—intellectual traits 
Acceptance of authority 
Flexibility . 
Rude or courteous 
Reaction to frustration 

1 Criterion of maladjustment same as for Schedule A. 

* Items representing “‘t’s’’ of 2.0 or more. 


The poor readers showed the greatest maladjustment on the 
intellectual traits by these three criteria. Of the six traits grouped 
in this category, only “‘sustained attention” shows a highly significant 
difference. The traits “abstracted or wide awake” and ‘mentally 
lazy or active” are significant with lower standards. In the other 
divisions, ‘‘acceptance of authority,” “flexibility,” “rude or cour- 
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teous,”’ and ‘reaction to frustration’”’ reveal differences, but the 
discrepancies are relatively small. 

It is not surprising that the intellectual traits would reveal differ- 
ences because of the obvious fact that a child not doing well in school 
subjects would be regarded as inattentive, mentally lazy, and probably 
abstracted by his teacher. These traits are undoubtedly assigned 
on the basis of the child’s attainment in reading and other similar 
subjects. Why there are not larger differences for the other traits is 
not clear, although there are definite indications that the poor readers 
are more aggressive in the academic situation than they are in the 
handwork situation or than the good readers are in the academic 
classes. It should be remembered that the items on Schedule B are 
given trait names, which imply relatively persistent modes of behavior 
in all situations, and the teachers may have appreciated that the 
children did not misbehave in the other classrooms. In the proper 
sense the term “‘trait”’ does not strictly apply for comparisons such as 
those which are made in the present study, since it does connote rela- 
tively stereotyped behavior. 

It should be noted that only on one trait is there a suggestion that 
the poor readers are better adjusted in academic than in handwork. 
This trait is Number 23, “‘ Yielding or assertive,’ and the difference is 
barely reliable. Although for a few traits the poor readers show 
slightly better adjustment than the good readers in the academic 
situation, none of the differences are reliable, with the possible excep- 
tion of Number 23, “‘ Yielding or assertive.”’ In other words, there is 
a fairly consistent tendency for poor readers to show maladjustment 
in the academic situation compared with the handwork situation and 
compared with good readers in the academic situation even though all 
differences are not reliable. 

Results obtained from both schedules point to the conclusion that 
the attempt to teach reading and related subjects to illiterate, defec- 
tive, adolescent boys will be met with acts of aggression against the 
teacher and against other children in the academic situation, that the 
children will show marked disinterest in school work, and that they will 
be considered abstracted, inattentive, mentally lazy and generally 
maladjusted intellectually by their academic teachers. Failure in 
reading is probably not entirely responsible for the aggressive acts; 
reading failure is most likely only an index of other handicaps such as 
speech impediment, retardation in arithmetic and spelling and other 
associated scholastic disabilities which academic instruction bears upon 
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and which in turn provokes hostile reactions. These reactions are 
manifest immediately in the frustrative situation and subside when the 
children are removed from that situation. 

It seems from these results that a policy of attempting to teach 
reading, and possibly related subjects, in the classroom to the boy who 
has already failed for six or seven years is doomed to failure. The 
teacher in making such an effort will find his class in turmoil, will be 
the object of intense hostility, and in a less confining atmosphere the 
child may attempt to escape by truanting. What methods are to be 
used in teaching these children are not readily apparent, nor are the 
original causes of failure. Where the benefit of individual remedial 
instruction is available, the teacher has been able to adjust to this type 
of child and bring about remarkable educational progress, as well as 
changes in emotional aspects of personality.' It would seem that the 
advisability of continued instruction in the classroom without the help 
of a remedial teacher is to be questioned. Various case studies indicate 
that the remedial teacher will encounter hostile reactions, but it is 
possible when dealing with the child individually to adjust to these 
difficulties and usually to remove them entirely after some progress 
in reading is made. 

SUMMARY 


The present study was designed to ascertain what disturbing class- 
room reactions might be attributed to the attempt to teach reading and 
related subjects to illiterate, defective adolescent boys. It was also 
devised as a simple test of the assumption made by Dollard and others 
that aggression inevitably follows frustration. 

The adjustment of one group of twenty-eight boys with average life 
age of 13.8 years, intelligence quotient of 64, and reading grade of 2.2 
was compared with that of another group having an average chrono- 
logical age of 13.9 years, intelligence quotient of 64, and a reading grade 
of 4.2. The adjustment of the two groups in academic rooms was also 
compared with that in handwork rooms. The Haggerty-Olson- 
Wickman Behavior Rating Schedules were used to measure adjustment. 

The poor readers (that is, those with an average reading grade of 
2.2) showed significantly more disinterest in school work, marked 
overactivity, temper outbursts, and speech difficulties in the academic 





! Kirk, S. A.: “The Effects of Remedial Reading on the Educational Progress 
and Personality Adjustment of High Grade Mentally Deficient Problem Children: 
—Ten Case Studies.” The Journal of Juvenile Research, 1934. 
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rooms than did the good readers (that is, those cases with an average 
reading grade of 4.2). The poor readers tended to bully other children 
more and to be more defiant of discipline. The backward readers were 
also more poorly adjusted in academic rooms than in handwork rooms 
in terms of these same problems, whereas the good readers were not. 
The same situation holds likewise for attentiveness and the total of 
intellectual traits on Division 1 of Schedule B of the Haggerty-Olson- 
Wickman Behavior Rating Schedules. 

The conclusion is reached that problem behavior and personality 
maladjustment are encountered in the academic classroom when the 
attempt is made to teach reading and related subjects to illiterate, 
defective, adolescent boys. Aggressive reactions are most con- 
spicuous. ‘These subside when the children are taught handwork, in 
which they are presumably more successful and certainly have not had 
as long a history of failure. Although Fisher’s statistical methods for 
small groups were used, the conclusions are more tentative than they 
would be if the groups were larger. Advisability of attempting to give 
group instruction in reading and related subjects to children who have 
failed reading for so long a period is questioned. Individual remedial 
teaching is a possible solution for the difficulty. 





A CRITIQUE OF THE COMMON METHOD OF 
ESTIMATING VOCABULARY SIZE, 
TOGETHER WITH SOME DATA ON THE 
ABSOLUTE WORD KNOWLEDGE OF 
EDUCATED ADULTS 


GEORGE W. HARTMANN 


Department of Psychology, Teachers College, Columbia University 


Facility in verbal expression has long been recognized as one of the 
best signs of high mental ability while variations in language skill have 
ordinarily been readily diagnosed by differences in the extent of one’s 
knowledge of words. No one doubts that the general intellectual 
competence of a person with a demonstrated vocabulary of twenty 
thousand words must be greater than that of one with but two thou- 
sand words at his command—a fact which explains the presence of this 
measure among the Stanford-Binet subtests.! Nevertheless, there 
are a number of ambiguities involved in these vocabulary estimates 
which should make a discussion of them of some interest to both 
teachers of English and educational psychologists. 

At least two important considerations have not been dealt with in 
current practice involving vocabulary estimates. One refers to the 
adequacy of the word-sampling itself and the other to the difference 
in the degree of understanding of a given item. Both issues are 
worthy of more treatment than they have received. 


DETERMINING THE WORD KNOWLEDGE OF A SUBJECT 


The customary method of sampling consists in selecting one 
hundred or so words at random from a dictionary. If the lexicon is 
known to contain one hundred thousand entries spread over twenty- 
four hundred pages one simply selects a term from a constant position 
on every twenty-fourth page; usually the first word printed in the 
upper right-hand corner is chosen, although any other fixed location 
could be selected. Occasionally rare, technical, or variant forms are 





1 Terman finds that extent of vocabulary correlates .90 with the total intelli- 
gence test score, a value which approximates the reliability and validity of the 
latter measure itself. Donald Snedden held that vocabulary size could be legit- 
imately used as a substitute for knowledge of the child’s IQ. Cf. A Study in 
Disguised Intelligence Tests (Interview Form), Teachers College, Contributions to 
Education, 1927, No. 291. 
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first eliminated—this, however, seems to be questionable practice 
because derivations constitute so large a part of any language—if one 
tried to avoid them entirely one would be left with but a few hundred 
basic roots! Having obtained one’s sample in this manner, this 
limited list of words is presented to the subject for definition. If he 
secures sixty right out of a possible hundred, his vocabulary is easily 
estimated to be sixty thousand words. 

On the surface, this common-sense ratio procedure appears to be 
faultless. But how correct is this ‘estimate’? When one remembers 
that the prolific Shakespeare is admired for the active use of but 
twenty-four thousand words in his vast literary productions,' it seems 
a bit odd to find an average undergraduate with a “passive” vocabu- 
lary exceeding two hundred thousand words (determined according to 
the standard method). Even allowing for the enormous growth of the 
English tongue during the last three centuries, this astounding 
conclusion cannot lead to anything but disbelief unless properly 
reinterpreted. 

The principal difficulty, of course, lies in the sampling technique 
employed. Obviously the total vocabulary estimate depends on the 
size of the dictionary employed. The average person who knows 
thirty words out of one hundred taken from an unabridged dictionary 
may know seventy out of one hundred chosen from a desk or vest- 
pocket volume. Yet his estimated vocabulary in the former case 
exceeds several times the latter because of the larger multiplying 


a dietionsry)). The solution 
nm in sample 


to the apparent paradox lies in the fact that one is here working with 
a selected or weighted ‘‘sample of a sample.’”’ No lexicon, however 
huge, contains all the words ever written or spoken in English, and all 
smaller ones, in turn, are restricted to definitions of the more common 
items. Clearly, it is less symptomatic of a wide vocabulary to know 
many ordinary forms than to understand a smaller number drawn from 
a more extensive group. Sample lists taken from any source will 
probably always yield a true index of the relative word knowledge of 
the population tested, but they cannot give a correct picture of the 





constant (- the value of the fraction, 





1 Kirkpatrick gives the number of words in Robinson Crusoe as between 5,000—- 
6,000 (Science, Vol. xv111, 1891, p. 108). According to Gerlach (Vocabulary Studies, 
Colorado Springs, 1917, p. 14), Milton in his poems alone used 11,377 words, 
Cowper, 11,284, and Shelley 15,957. Actual count shows 8,674 different words 
used in the Old Testament (Vizetelly: Essentials of English Speech, p. 215). 
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absolute vocabulary. All vocabulary “estimates’’ made in recent 
years fit only the dictionaries upon which they were based.! 

Another serious limitation of the common procedure is that it makes 
little if any allowance for qualitative variations in the precision of the 
definitions. One can “know” anything very slightly or exceedingly 
well. Generally, the scorer is satisfied if the subject has indicated the 
least ‘‘idea”’ of the term’s meaning. Argon, e.g., may be correctly 
defined as a chemical element or as a rare gas, but the second designa- 
tion implies a richer background of acquaintance with the word.’ 
While it is true that a subject who gives the more complete definitions 
will also have a larger proportion right, it is probable that the actual 
gap between the superior and average individual is far greater than the 
results indicate. 


VERIFYING THE PHENOMENON OF LARGE VOCABULARIES BY DIFFERENT 
TECHNIQUES 


The writer was started on this train of thought by a Summer 
session experience with a graduate class of one hundred experienced 
teachers. To demonstrate the existence of individual differences even 
among highly homogeneous groups he administered a word list of 
one hundred items taken from every twenty-fourth page of the latest 
Webster’s New International Dictionary, which contains nearly 
twenty-four hundred pages and is said to list four hundred thousand 
terms. The mean score was 56, yielding an average vocabulary of 
roughly two hundred twenty-four thousand words. This incredible 
result is even more startling when one reflects upon the mediocre 
English employed even by well-trained people. Granted that this 
test measures only word recognition or one’s reading vocabulary, 
which is conceded to be more extensive than the writing or speaking 
repertory, the obtained value is still fantastically large. The modest 
few hundred or thousand words which are often said to meet the needs 
of the ordinary person did not fit this finding at all. Of course, the 





‘The writer has seen the following values given for the mean vocabulary of 
fourth-grade children: 4,000; 5,220; 6,887; 7,020; 10,395; 10,886. No two authori- 
ties give the same estimate, although some agree more closely than others. All 
these ‘‘counts” are but rough approximations to the unknown reality. 

? Cuff remarks: ‘‘The fact that a child can define the word corn as a noun does 
not prove that he can define it as a verb, and the fact that a child can give some 
definition of a word or can recognize a simple definition of a word does not show 
that he has a thorough acquaintance with it.” (‘Vocabulary tests.” J. Educ. 
Psychol., Vol. xx1, 1930, pp. 214-220.) 
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comparative social or educational value of knowing rare or common 
words is not under discussion here. 

To test the hypothesis that the nature of the sampling was responsi- 
ble for these gross divergences, four shorter lists from four different 
sources were compiled as follows: 


(1) Seventy-five words from the current Funk & Wagnalls Unabridged 
Dictionary (containing approximately 450,000 words—454,088 according to 
Seashore’s laborious sampling calculations). 

(2) Fifty words from Murray’s Oxford English Dictionary (containing 
roughly 414,000 words). 

(3) Twenty-five words from the Webster Collegiate Dictionary (with 
97,000 words). 

(4) Twenty-five words from the Winston Simplified Dictionary (100,000 
words). 


None of the lists contained any word in common. They were given 
at one sitting to a new group of one hundred college students, largely 
juniors and seniors. The “identification” technique was employed, 
i.e., each item was considered correct if the proper synonym, use, or 
illustration was given.! Use of the conventional method of computing 
showed that the average member of this group had a mean vocabulary 
of 238,620 words according to List I; 216,920 with List II; 54,564 with 
List III; and 62,000 with List IV. It will be observed that the size of 
the vocabularies obtained parallels perfectly the dimensions of the 
dictionaries upon which they were founded. That it is the magnitude 
of the lexicon which is the deciding factor rather than the size of the 
sample word series is evident from a comparison of the figures for lists 
I and II above. Apparently the smaller dictionary does not give a 
person the opportunity of demonstrating all the words he knows! Of 
some interest and significance is the fact that in every case the esti- 
mated vocabularies are approximately the same per cent of the 
dictionary totals—thus, the average subject ‘‘knows”’ 53 per cent of 
all the words in dictionary I, 52 per cent of dictionary II, 55 per cent of 
dictionary III, and 62 per cent of dictionary IV. 

The approximate stability of this proportion offers some justifica- 
tion for a policy of converting one total estimate into another by simply 





1Sims found that this method is more reliable and valid than the multiple- 
response, matching, and checking procedures. The identification test correlates 
highest with the mean scores of the other three tests, and it yields the lowest PE. 
Cf. “Reliability and validity of four types of vocabulary tests.””’ J. Educ. Res., 
Vol. xx, 1929, pp. 91-96. 
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multiplying the values derived from smaller dictionary samples (since 
they are crude “samples of samples’ determined in detail by the 
varying policies of the lexicographers) by the number of times the 
largest dictionary exceeds it—e.g., we may assume that a subject who 
knows 50 per cent of any word list will also know about 50 per cent of 
any other similarly prepared (unless the initial list was based upon 
too small a source, in which case it would lose its differentiating value). 
The absolute probable error, of course, tends to increase greatly with 
the larger lexicon samples, since the spread of the interval of uncer- 
tainty becomes proportionately wider. 

An additional check on the legitimacy of the rough conversion 
procedure just mentioned is provided by the high intercorrelations 
among the different lists as shown in Table I. 


TaBLE I.—INTERCORRELATIONS AMONG Four Worp SampLes DRAWN FROM 
DirFERENT DIcTIONARIES (VARIABLES = Per Cent Correct). N = 100 








List I II Ill IV 
I .77 .73 .85 
II .79 81 
III .88 
IV 

















These values are about as high as the reliability coefficients of most 
abbreviated tests and tend to support the view that interchangeability 
of the ‘‘random”’ (really, systematically built) word lists is defensible. 

In continuing this analysis, it seemed desirable to measure the 
stability of estimates based upon samples of varying size drawn from 
the same dictionary. The following lists were compiled solely from 
the Webster Unabridged International volume: 


100 words from the upper left-hand corner of the open book (every twenty- 
fourth page). 

50 words from the bottom right-hand corner of the open book (every forty- 
eighth page). 

25 words from the bottom left-hand corner of the open book (every ninety- 
sixth page). 

25 words from the upper right-hand corner of the open book (every ninety- 
sixth page). 


The subjects on this occasion were sixty normal-school graduates with 
varying amounts of teaching experience. From the figures below, 
it will be seen that the same astonishingly high vocabularies result: 
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‘ ; Estimated 
List, words Locus Mean number right vocabidary 
100 Upper left 63 .37 253 , 480 
50 Bottom right 32.37 258 , 960 
25 Bottom left 17.88 286 , O80 
25 Upper right 14.95 239 , 200 
DELS corn Ga CRT bee bee ds coker casas roeccccodcceueae tes 259 , 430 





It is puzzling to explain why one twenty-five-word list should be 
twenty per cent easier than another, except in terms of too limited a 
sampling. This fact, in conjunction with the reduced coefficients of 
correlation shown in Table II for the smaller lists, suggests that one 
should never use less than fifty words if a reasonably close estimate is 
desired. Interestingly enough, this is about one-hundredth of one 
per cent of the half-million words in the English language—the 
amount empirically considered by opinion-poll specialists the minimum 
proportion of the total electorate to be used in predicting the results of 
Federal elections! 


TaBLE II].—INTERCORRELATIONS AMONG Four Worp SAMPLES OF VARYING SIZE 
DERIVED FROM THE SAME Dicrionary (N = 60) 








List length 100 50 2501 25 ur 
100 s . 86 .74 71 
50 A, ar we 71 
2501 (3 see ne .77 
25ur 

















These correlational values are very similar to the figures presented 
in Table I and indirectly support our contention that the commonly 
accepted estimates of absolute adult vocabularies are in serious need 
of upward revision. In a further search for confirmation, two word- 
lists were selected in straight serial order from Webster’s and presented 
to forty-five undergraduates majoring in industrial education—a 
group of prospective shopwork instructors, whose mastery of English 
is relatively weaker than their manual skill. The initial letters ‘‘S”’ 
and ‘‘E”’ were chosen, largely because they are respectively the most 
frequently used consonant and vowel. One hundred separate words 
in “‘s,” beginning at se-baptist and ending with semen (omitting com- 
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pounds of second- and self-) yielded a mean vocabulary of 176,080 
(44.02 per cent right = average of the sample); fifty consecutive 
words in “e” from easterner to ecclesia gave a mean vocabulary of 
145,040 words (18.13 per cent right = average of the original sample) 
—the range for both sets ran from 72,000 to 232,000. These values 
are somewhat lower than those reported above, but the explanation 
undoubtedly is to be found partly in the “‘non-linguistic’’ nature of the 
population employed and partly in the deficiencies of a “‘ concentrated ”’ 
sample as compared with the merits of the more orthodox “‘scattered”’ 
types used above. 


HOW INESCAPABLE IS THIS CONCLUSION? 


It is futile to keep on multiplying instances which all confirm our 
main thesis; viz., that the college man’s vocabulary is far greater than 
that with which he has been credited in the past. Is there any 
“joker” in the logic of the estimating procedure? Perhaps the dic- 
tionary content is not sufficiently homogeneous to be correctly “‘ repre- 
sented” by one hundred entries. Seashore has suggested that the 
difficulty may lie in the definition of a word. The four hundred 
thousand or more terms included in an unabridged lexicon embrace 
not only “‘basic’’ words, but also derived forms, such as verb forms, 
compounds, as well as different usages and meanings of the same term. 
This interpretation might have some slight weight with the Funk 
and Wagnalls volume, in which the heavy black face type words at 
the upper left-hand corner of the definition paragraph constitute the 
true “‘basic”’ forms, while the rest of the paragraph contains less 
significant derivatives. However, Webster runs each word in pure 
serial order; yet estimates derived from either source agree closely 
(see Table I, above). Furthermore, from the philological standpoint 
the “‘basic”’ word just referred to are not true root forms. It is not 
correct to say that if one knows the “‘ basic” term that all its “‘derived”’ 
forms are also clear, particularly since the tracing of the historical 
affinities of words is an intricate scientific discipline in itself. The 
pitfalls of “‘folk etymology” illustrate the types of errors made in 
this field. : 

There is only one conclusion possible—the average undergraduate 
has an actual comprehension or recognition vocabulary in excess of 
two hundred thousand words. In any z sample of terms selected in 
truly random fashion from the unabridged dictionaries, the typical 
college student will be able to define correctly X/2 as a minimum. 
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His “‘contextual”’ or operative reading vocabulary is probably even 
larger. Evidently scholars have done a grave injustice to the ‘‘man 
in the street”? by assuming, on the basis of old and faulty evidence 
secured by defective techniques, that his command of words is restricted 
to a few thousand words at best. 

The essential correctness of this view is supported and fully con- 
firmed by the meticulous studies of Seashore and Eckerson! in this 
area, although even their estimates do not run quite as high as those 
found in this inquiry. The figures presented in this paper are all 
based upon the performance of undergraduate and graduate students 
at the Pennsylvania State College during the years 1932 and 1934, 
whereas the Seashore data apparently were secured from under- 
graduates exclusively at the University of Oregon, the University of 
Southern California, and Northwestern University during the entire 
decade of the ’thirties. The cultural time factor is consequently the 
same, a matter of no small importance, for it is highly probable 
that the average person’s vocabulary has increased substantially 
during recent generations. 

Seashore’s computations indicate that his average undergraduate 
“knows” 155,736 words (assuming that the Funk and Wagnalls 
Unabridged Dictionary contains a total of 454,088 words); the statis- 
tics of the present writer suggest that this apparently fantastic figure 
is itself likely to be a gross underestimate by about one hundred 
thousand words, since a similar student population when referred to the 
calculation-base provided by the Webster dictionary with its presumed 
smaller “‘round” number of four hundred thousand words yielded a 
mean consistently larger than that reported by Seashore. This 
difference in estimated absolute vocabulary is not important as both 
studies agree in pushing the averages up into figures of a magnitude 
conventionally deemed unbelievable. 





1 Seashore, R. H. and Eckerson, L. D.: ‘‘The measurement of individual differ- 
ences in general English vocabularies.” J. Educ. Psychol., Vol. xxx1, 1940, pp. 
14-38. Page 25 of the Seashore paper contains a reference to this present inquiry 
which in unpublished form was available as early as 1933. It was not published 
sooner because of the writer’s reluctance to make a claim so far out of line with 
current opinion. Now that the evidence appears overwhelming, such a fear can 


no longer be justified. 





DERIVING COMPREHENSION, RATE AND ACCURACY 
OF READING NORMS FOR A SHORT FORM 
OF THE METROPOLITAN ACHIEVEMENT 
READING TEST 


GEORGE SPACHE 


Friends Seminary, New York City 


Since the publication of their first edition in 1933, the Metropolitan 
Achievement Tests have come to be one of the most widely used 
batteries of their kind. Although standardized on public school 
populations, the batteries have been found suitable for surveys of both 
public and private schools. If used for classification or placement, as 
at the beginning of the school year when a fairly comprehensive but 
rapid survey of entering pupils’ capabilities is desired, the testing time 
required is a great deal to demand of the average child of the inter- 
mediate grades. A total battery, such as the Intermediate Partial for 
Grades IV-VI, requires approximately two and three-quarter hours. 
It includes tests of reading, vocabulary, arithmetic fundamentals and 
problems, English usage and spelling. Of course, the testing time 
could be reduced by eliminating some of the tests included in this 
battery. But if the results of the initial testing are to be used not only 
for placement, but also for outlining tutoring or remedial work in the 
early part of the school year, use of only a portion of the battery is not 
desirable. In our opinion, the situation is best met by abbreviating 
certain of the tests rather than by omitting them entirely. This 
discussion is an attempt to demonstrate a method of abbreviating the 
testing time of one of the Metropolitan tests while still securing a valid 
estimate of the pupils’ abilities. 

To demonstrate the method of abbreviation, we have chosen to 
work with the reading test of the Revised Intermediate Battery. 
Similar abbreviations have been made by the writer of the arithmetic 
fundamentals, arithmetic problems and spelling tests of the same 
battery. However, space does not permit presentation of the data 
on the short forms of the other tests. The present discussion will be 
confined to the possibilities of abbreviating the reading test. 

The reading test of the Intermediate Battery' is composed of two 
sections. In the first, the task is to supply the words omitted at 
various points throughout the paragraphs. This section forms 
approximately two-thirds of the entire test or forty-three of the sixty- 
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four test items. In the second section, the procedure is varied by 
requiring the pupil to answer a number of questions on each of six 
paragraphs. The two sections are administered separately in that 
at the end of fifteen minutes the pupils are instructed to begin the 
second section, whether or not they have completed all the items of 
the first section. Ten minutes are allowed for the second reading 
section. In difficulty the two sections overlap somewhat inasmuch as 
pupils of the lowest grade at which the test is suitable are usually able 
to complete items successfully in both sections of the reading test. 

Since the two parts of the Metropolitan Reading Test are relatively 
independent scales, it is possible to secure a measure of reading ability 
by the use of either scale. Thus, for example, it is possible to abbrevi- 
ate the test in the interests of saving testing time by using only the first 
section. Of course, since the first section is only two-thirds of the 
entire test, its results are probably slightly less reliable than those 
obtained from the use of the whole. However, if well constructed, the 
single scale should still prove a valid and fairly reliable short reading 
test. 


TABLE I.—CoRRELATION BETWEEN Raw Score ON SHORT FoRM AND GRADE ScorRE 
oN Lona Form or METROPOLITAN READING TESTS 





Raw score Grade score 





Grade| N Metropolitan battery r 
Mean; SD | Mean; SD 








III | 100 | Intermediate Partial, Form B| .929 | 16.4 | 7.76 | 5.3 | 1.00 
IV 89 | Intermediate Partial, Form B | .901 | 24.9 | 7.55 | 6.37 .93 
V 93 | Intermediate Partial, Form B| .894 | 30.0 | 7.48 | 6.95 | 1.19 
III-V | 282 | Intermediate Partial, Form B| .928 | 23.6 | 9.5 | 6.18 | 1.26 























Some of the merits of the short form of the Metropolitan Reading 
Test as a measure of reading ability may be determined from the 
correlation between raw scores on the short form and grade scores 
on the entire test. These are not validity coefficients inasmuch as 
they are the relationships between an entire test and two-thirds 
of the same test. However, the correlations can be interpreted as 
indicating marked tendencies for raw scores on the short form to be 
related to grade scores on the whole test; or to express it somewhat 
differently, it is possible to predict the probable grade scores on the 
whole test from raw scores on the short form with a great deal of 
accuracy. Since the correlations given in Table I are almost as great 





Comprehension, Rate and Accuracy of Reading Norms 361 


as the reliability coefficients for the whole test,* we may conclude 
that the estimates of reading ability made by means of the short form 
are practically as similar to those from the entire test as scores from 
another form of the entire reading test would be. 

The table includes data for the third, fourth and fifth grades 
although the Metropolitan Intermediate Tests are offered by their 
authors for use in grades IV to VI. It is a common practice in private 
schools to use the test invended for the public school grade above that 
being tested in order to give ample opportunity for the superior 
mentality and achievement of private school pupils to manifest itself.* 
This practice does not affect use of the short form reading test with 
public school populations. 

The desirability of the use of the short form reading test may be 
determined by the apparent difficulty of the test. It is commonly 
accepted that a test should permit the average pupil to complete 
successfully about fifty to sixty per cent of the items attempted. 
Difficulty of the short form may be estimated from the fact that third- 
graders completed successfully sixty-seven per cent of the items 
attempted; fourth-graders, sixty-nine per cent; and fifth-graders, 
seventy-five per cent. On the basis of this testing, which occurred in 
May or at the end of the school year, the short form may be considered 
as of moderate difficulty for third- and fourth-graders and of slight 
difficulty for fifth-graders. 

The norms now available for the entire test cannot be used in 
translating the raw scores from the short form into reading grade 
scores. This can be shown quite readily. The mean raw score of 
23.6 for the combined grades III-—V is equivalent to a grade score of 
5.8 according to the test’s present norms. However, the actual mean 
grade score is 6.18. The difference is accounted for by the fact that 
items completed on the second section, which are credited in deter- 
mining grade score according to the present norms, are not credited 
in the raw score on the short form. It is apparent that use of the 
present norms with the short form would result in grade scores which 
grossly underestimate the actual reading ability. 

In establishing new norms for the short form, it is possible to 
combine the data from the three grades since the correlations between 
the variables and the standard deviations of each grade are not signifi- 
cantly different. In Table II equivalent grade scores for the raw 





* The reliability coefficients, according to the Supervisor’s Manual,’ are as 
follows: grade IV, .865, V, .905, and VI, .818. 
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scores possible on the abbreviated reading test are given. These are 
derived from the regression equations in score form of the correlation 
between raw and grade scores in grades ITI-V. 


TaBLeE II.—Norms ror CoMPREHENSION AND RATE FOR THE SHORT FORM OF THE 
METROPOLITAN INTERMEDIATE READING TEST 

















Raw score or Grade score Raw score or Grade score 
items } items 
attempted | Comprehension | Rate | attempted | Comprehension | Rate 
43 8.5 7.8 22 6.0 5.0 
42 8.4 7.7 21 5.9 4.8 
41 8.3 7.5 20 5.7 4.7 
40 8.2 7.4 19 5.6 4.6 
39 8.0 7.3 18 5.5 4.4 
38 7.9 v.8 17 5.3 4.3 
37 7.8 7.0 16 5.2 4.2 
36 7.7 6.8 15 5.1 4.0 
35 7.6 6.7 14 5.0 3.9 
34 7.4 6.6 13 4.9 3.8 
33 7.3 6.4 12 4.8 3.6 
32 7.2 63) 11 4.6 3.5 
31 7.1 6.2 | 10 4.5 3.4 
30 7.0 6.0 9 4.4 
29 6.8 5.9 8 4.3 
28 6.7 5.8 7 4.1 
27 6.6 5.6 6 4.0 
26 6.4 5.5 5 3.9 
25 6.3 5.4 | 4 3.8 
24 6.2 5.2 | 3 3.7 
23 6.1 5.1 | 2 3.5 
1 3.4 

















It is also possible to present norms for rate of reading based on the 
same population. Rate of reading is, of course, related to age and 
grade placement, the correlation between items attempted and grade 
score on the short form test being .847 for two hundred eighty-two 
cases. Norms for rate should, therefore, be expressed in terms of 
grade scores. The data given in Table II are based upon the regression 
equation in score form of the correlation between number of items 
attempted and grade score on the short form test. In determining the 
number of items attempted, all items for which an answer was offered, 
however incomplete or incorrect, were considered. 
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In the table a single column is used for the range of possible raw 
scores (items correct) and for the range of possible items attempted. 
Having determined the number of items correct and the number 
attempted, the respective grade scores in comprehension and rate may 
be found in the appropriate columns. 

The validity of the short form reading test, or the extent to which 
predicted grade scores obtained on the short form resemble actual 
grade scores on the entire test, may be estimated from the data upon 
which Table II was established. The predicted grade scores had the 
following characteristics: Mean, 6.15, median, 6.17, standard deviation, 
1.14. Similar measures for the actual grade scores were: Mean, 6.18, 
median, 6.10, standard deviation, 1.26. The difference between the 
corresponding measures of central tendency of the two distributions 
are insignificant.* There is, of course, some tendency for the com- 
prehension grade scores on the short form test to underestimate actual 
reading ability at either end of the range of grade scores. This is due 
to the fact that the norms for the short form extend from 3.4 to 8.5 
while those for the entire test extend from 3.7 to 9.5. 

Other data on the validity and reliability of the short form reading 
test are available but cannot be presented because of the limitations of 
space. However, in validating comparisons with other common read- 
ing tests such as the Progressive Achievement Reading Comprehension 
or the New Stanford Achievement Paragraph Comprehension, there 
are insignificant differences in validity between the short and long 
forms. The short form test is as much like other common reading 
tests, or is as valid as the entire Metropolitan test. In comparisons of 
reliability, involving retest coefficients, the short edition compares 
favorabiy with the entire test. Similarly, in coefficients between two 
forms of the short and long tests (an abbreviated edition of Form C 
of the Metropolitan test having been established) the short form 
compares favorably with the entire test. In both types of reliability 
coefficients, the differences between the two editions are insignificant. 

It is important to learn whether a pupil’s reading is characterized 
by superficial or adequate comprehension. The pupil who evidences 
accurate comprehension in only twenty per cent of his reading is a 
poorer reader than one who completes eighty per cent of his reading 
correctly, even though both may receive the same grade score. The 
former may be reading too rapidly and superficially for good com- 





* In computing these differences a correlation of .900 between the variables and 
the long formula for the standard error of the difference was used.‘ 
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prehension while the latter may be reading more slowly than his age 
and grade placement would warrant. Analysis such as this may be 
made on the basis of norms for rate and accuracy of reading. Cases of 
superficial comprehension through too great a rate, inadequate 
comprehension accompanied by average rate, good comprehension 
hindered by slow rate, etc., may be detected and distinguished by 
contrasting comprehension, rate and accuracy of reading scores. To 
aid in this type of reading diagnosis, norms for accuracy of reading or 
for accuracy of comprehension may be offered. 

The percentile norms for accuracy given in Table III are expressed 
in terms of the per cent of items correct, ¢.g., number of items correct 
divided by number of items attempted. Inspection of the data from 
our private school population showed little relationship between grade 
score and accuracy. Therefore, the data for the three grades were 
combined in formulating the table. These norms are based upon the 
achievement of private school pupils and may not be serviceable in 
interpreting the achievement of public school pupils until it has been 
shown that accuracy of reading is similar in the two types of schools. 


TaBLeE II].—PercentTiteE Norms ror Accuracy OF READING FOR THE SHORT 
ForM METROPOLITAN ACHIEVEMENT READING TEST 


Per CENT 

PERCENTILE CoRRECT 
90 95 
80 90 
70 86 
60 83 
50 79 
40 75 
30 70 
20 64 
10 53 

REFERENCES 


1. Allen, Richard D., Bixler, Harold H., Connor, Wm. L. and Graham, Frederick 
B.: Metropolitan Achievement Tests—Intermediate Battery Partial. Yonkers: 
World Book, 1934. 

2. Allen, Richard D., Bixler, Harold H., Connor, Wm. L., Graham, Frederick B. 
and Hildreth, Gertrude: Metropolitan Achievement Tests—Supervisor’s 
Manual. Yonkers: World Book, 1933. 

3. Spache, George: ‘‘The Use of the Kuhlmann-Anderson Intelligence Tests in 
Private Schools.” Journal Educational Psychology, Vol. xxx, November, 
1939, pp. 618-623. 

4. Walker, Helen M.: “Concerning the Standard Error of a Difference.” Journal 
Educational Psychology, Vol. xx, January, 1929, pp. 53-60. 











THE VOCABULARY OF EDUCATIONAL PSYCHOLOGY 


GLENN MYERS BLAIR 
University of Illinois 


At a recent meeting of a group of educational psychologists at the 
University of Illinois, the question arose as to what psychological 
words and terms should be mastered and understood by students who 
were just finishing the introductory course in educational psychology. 
It was the feeling of some of the members of the group that educational 
psychology possesses a standard vocabulary with which every pros- 
pective teacher should be familiar. A doubt, however, appeared in 
the mind of the writer as to whether the science of educational psy- 
chology has reached that stage of stability where a uniform vocabulary 
has emerged. Not so long ago a book appeared with the title Seven 
Psychologies,‘ indicating somewhat the diversity of viewpoints in 
psychology. In America, there are also several educations. We 
have the “ progressives” and the “‘essentialists,”” to mention only two 
divisions of educational thought. If we should take the two mentioned 
educations and mix them with the seven psychologies, the resulting 
number of combinations would be fourteen. Thus we might have at 
least fourteen educational psychologies. 

The writer decided to make a study to determine if educational 
psychology has a uniform technical vocabulary, and if so to determine 
what words would be included in such a list. 


PROCEDURE 


The procedure of the study called for a careful analysis of the 
vocabulary used by the authors of the following eight very recent books 
in educational psychology: 


1. Commins, W. D.: Principles of Educational Psychology. New York: 
The Ronald Press, 1937. 

2. Douglas, O. B., and Holland, B. F.: Fundamentals of Educational 
Psychology. New York: The Macmillan Co., 1938. 

3. Garth, Thomas R.: Educational Psychology. New York: Prentice- 
Hall, Inc., 1937. 

4. Griffith, Coleman R.: Psychology Applied to Teaching and Learning. 
New York: Farrar & Rinehart, 1939. 

5. Judd, Charles H.: Educational Psychology. Boston: Houghton-Mifflin 
Co., 1939. 


1 Heidbreder, Edna: Seven Psychologies. New York: The Century Co., 1933. 
365 








- org the 
vw - . 


Se eee oot, | Aa - 
“ . . as 














366 The Journal of Educational Psychology 


6. LaRue, Daniel W.: Educational Psychology. New York: Thomas Nel- 


son & Sons, 1939. 
7. Mursell, James L.: Educational Psychology. New York: W. W. Norton 


& Co., 1939. 
8. Sandiford, Peter: Foundations of Educational Psychology. New York: 


Longmans, Green and Co., 1938. 


All of these books bear the copyright imprint of 1937 or later, and 
include virtually every text on educational psychology which made its 
appearance during the three-year period 1937-1940. 

For each of the eight books, the analysis consisted of the following 


two steps: 


(1) Each word on every odd page was looked up in A Teacher’s Word Book 
of 20,000 Words, by E. L. Thorndike. All of these words which did not occur 
in the Thorndike first 10,000 were written on sheets of paper and a frequency 
count kept. The list thus included those words found in the top 10,000 of the 
Thorndike list and also those unusual words not found in Thorndike’s list at 
all. Thus was provided a group of difficult and technical words for each book. 

(2) Each of these lists of so-called difficult words was then checked with 
Warren’s Psychological Dictionary in order to single out those terms which 
were of a psychological nature. 


By this method there was secured for each of the eight educational 
psychology textbooks a frequency count of the various difficult 
psychological terms used in it. 


THE RESULTS 


The general findings of the study can be noted in detail in the vari- 
ous tables that have been arranged. Table I shows the psychological 
words found at least twenty times in one or more of the eight educa- 
tional psychologies studied. Perhaps the most striking observation 
from Table I is the fact that certain words are widely used by some 
authors but avoided entirely by others. Thus cerebrum and cerebral 
are used sixty-nine times by Judd on the odd pages of his book but 
used not at all by Commins, LaRue, and Mursell.! Case after case 
of this kind occurs. The words eduction and educe are used thirty- 
eight times by Commins, but six of the other writers have no use for the 





1 The reliability of the method of sampling which included all the words on odd 
pages is high. For Commins’ book the procedure used on the odd pages was 
duplicated for the even pages. When the frequency of Commins’ thirty most used 
psychological words based on the odd pages was correlated with the frequency of 
these same words on the even pages, an r of .92 was obtained. 
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TaBLE I.—PsycHoLoGcicaL Worps Founp at Least TwEeNTy TIMES IN ONE OR 
More or THE EpvucaTIONAL PsycHoLocies STUDIED 








Doug- 
Com- |las and Grif- La- | Mur- | Sandi- 
Werd mins | Hol- Garth fith Judd Rue sell ford 
land 

Phas bacctcct ee cnshseveevedia 58 37 0 3 2 4 5 1 
Mi. L ed ead ah cond ob ekeed ne 11 3 1 50 3 20 22 7 
Cerebrum, cerebral.................. 0 20 5 1 69 0 0 11 
Chromosome, chromosomal.......... 12 0 4 3 0 0 13 31 
RL. oc vcudedacstaebuakeses 35 31 9 2 2 0 3 5 
Concept, conceptual, conceptualisation|; 36 64 25 56 7 45 28 9 
Contiguity, contiguous.............. 4 27 0 1 1 0 0 1 
Correlation, correlate, correlative... .. 102 35 12 15 3 2 4 34 
i coh bnceodesascawe 0 44 1 0 3 0 0 34 
Deductive, deduction, deduce........ 3 24 6 1 0 2 1 1 ’ 
Differentiation, differentiate.......... 43 1 0 4 4 0 28 10 
Discrimination, discriminative........ 22 8 0 9 6 3 2 13 r 
aR RE Aire IT a, ape 14 0 3 7 0 i 21 4 eth, 
Eduction, educe, eductive............ 38 1 0 0 0 0 0 0 ry 
Ct ci deahé.eh eee ee Cows Cue a 0 0 0 0 0 26 0 0 
ne. ce ton oan eulhe s san 5 0 0 0 0 0 26 1 ‘ 
Emotional, emotionality, emotionalize.| 26 42 24 85 7 15 22 51 
Environmental, environmentally...... 37 7 6 24 5 9 5 19 
DLS. «ck bbs chalk beb00 660hetebe 1 0 0 24 0 2 0 2 
Evaluate, evaluation, evaluative...... 22 14 1 0 2 4 6 3 
Evolve, evolutionary................ 5 0 1 0 25 0 0 3 
Generalization, generalize............ 27 3 5 4 28 5 12 2 i 
DUC tte cekend tocaberbesnaean 34 14 3 3 0 3 11 27 
Genetic, genetically................. 30 2 6 30 0 1 4 ll 4 
Se ae Oe. roel 21 0 2 0 0 0 0 1 , 
cn cavsastsieebesdenechus 15 36 15 0 0 0 0 9 oe 
Th ok. so ld2h pase ekeed es 3 27 0 0 1 7 0 0 he 
Inhibition, inhibit, inhibitory......... 20 19 3 19 1 7 1 11 ¥ 
Integration, integrate, integrative..... 65 s 6 7 2 1 23 18 
Interaction, interactive, interact...... 18 0 0 0 0 1 79 0 
Introspection, introspective, intro- 

spect, introspectionist............. 5 14 2 1 0 5 0 24 
ET eve edccecceccuat 22 11 3 6 4 5 3 5 
IE RI fA Pe 14 3 0 1 30 3 15 0 
Maturation, maturative, maturational.| 31 28 S 24 1 10 5 18 
DCL. cde veab bad ss ooeue bed 20 6 1 23 5 3 11 0 
Motivation, motivate............... 54 20 7 11 0 25 12 2 
ities Ciutat gh 6k ede bebeee 0 26 8 0 68 6 1 8 
ioc «db os ved cue wecdeh ae 0 38 12 0 0 0 1 28 
iti he a oo oe 13 28 6 0 0 1 12 ll 
Objective, objectivity............... 36 62 9 12 2 9 14 18 
Perceptual, percept.............0.-; 16 76 0 35 20 5 0 0 
a SES SE aE pa oe 38 39 10 14 4 2 1 57 
le i ea 0 0 5 0 0 69 0 18 
Psychology, psychological, psychologist| 145 143 105 120 141 36 129 134 
ESS a ee 0 36 0 0 0 0 0 0 
Es ina Ga oo en wale due ae aia 0 57 3 2 0 0 0 22 
I no on ou cn Rule's bu swaleae 28 18 2 3 2 1 6 2 
Retention, retentive................. 22 11 10 3 5 1 3 6 
MEG Uc 4x 6 40 a4cdaeewoenbalel 3 3 2 27 0 0 3 1 
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TaBLE I.—Continued 
































Doug- 
> Com- |las and Grif- La- | Mur- | Sandi- 
wee mine | Hot- | °*™™] un | 2°] neo | coll | ford 
land 
ae, irae ae Te atl 2 Lae Sale Releases 11 75 10 7 49 4 1 4 
EE ct ote ne tha Aaa ee Caw «+ eae 25 3 9 9 2 1 3 1 
Standardization, standardize......... 3 45 6 3 3 2 20 6 
SER ASE SER Ree EN yeaa a 2 1 0 20 1 0 2 0 
ING 5s 6 oe 5 6-clp bulb h.O aN 0.6 6 0 oa 20 44 8 1 44 5 8 6 
ES Se ee Oe ee a 180 41 7 86 7 28 22 51 
FORTS SRNR So, A eae ay 4 6 3 23 3 3 17 8 
Validity, validate, validation.........| 22 12 6 0 5 0 10 5 
ee te a See a oe 28 1 4 2 0 0 1 28 
Ps ies Cika sb ve Caaeh Chea ec ees 0 | 26 3 0 0 0 0 10 





expression atall. The only other book using this term besides Commins 
is Douglas and Holland, where the word educing is used once. Ina 
similar manner the word propensity is used by LaRue sixty-nine times 
but used not at all by five of the writers. The word interaction is 
used seventy-nine times by Mursell but seldom used by the other 
educational psychologists. Five of them use it not at all. Douglas 
and Holland use the word neurone thirty-eight times and the word 
misceral twenty-six times, but four of the other writers do not feel any 
need to employ either of these terms. Table I contains fifty-nine 
psychological words widely used by some of the eight educational 
psychologists. Of these fifty-nine words, however, only a very small 
number are used at least once by all of the writers. 

Table II presents the fifteen most used psychological words of each 
of the authors. These might be called the key words of each book. 
The only word that occurs on all eight lists is the word psychology. 
Many of the words occur on only one list. Of such words, Commins 
used eduction; Douglas and Holland use reactor; Garth has awareness, 
retention, modification, introversion; Griffith has genetic, rote, ethical, 
meaningful, static, inhibition; Judd has neural, stimulation, generaliza- 
tion, evolve, articulation, hypnotize, hand-writing, dissociation, and 
analytical; LaRue has elation, propensity, diagnose, curiosity, vocational, 
and creativeness; Mursell has interaction, emergence, dynamic, tension, 
and cultural; Sandiford has chromosome, variability, and introspection. 

A study of the original word sheets reveals that there are many 
words which are used by one and only one of the eight authors. In 
Table I can be found a few of such words. Some others are autistic, 
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TaBLeE II.—FirreEN Most Usep PsycHo.tocicat Worps or EAcH OF THE 
AvutHors (Key Worps) 
































Commins Douglas and Holland Garth 
i See 180 1. Psychology......... 143 1. Psychology.......... 105 
2. Psychology........... 145 2. Perceptual.......... 76 SD GD, occcceeectes 25 
S. Corvrefation........... 102 Ds Ge ao vadccccce 75 3. Emotional........... 24 
4. Integration........... 65 GS, GR, . ciccccaces 64 4. Modification......... 19 
Py Pi adscevageéee 58 5. Objective........... 62 5. Awareness........... 15 
6. Motivation........... 54 6. Receptor........... 57 6. Hypothesis.......... 15 
7. Differentiation........ 43 7. Standardized....... 45 Set Pe cvcctccccccs 14 
ee  codacccese 38 lo csbecaede” Ge Wns cca ccestete 12 
9. Physiological.......... 38 9. Stimulation......... 44 9. Correlation.......... 12 
10. Environmental........ 37 | 10. Emotional....... 42 | 10. Retention........... 10 
i iD, -csene see 6 a. Mia Pn i.e thes ceeec Oe Beis Pence tc ccescen 10 
12. Concept.............. 36 | 12. Physiological....... 39 | 12. Physiological........ 10 
OD GIRS so» 6'o 6 6:0 ok Ee EO EL, cc anncies 62 38 | 13. Coefficient........... y 
ccc ccccatcse | Hi oes 66466 de 37 | 14. Objective............ 9 
15. Maturation........... 31 |] 16. Remetor............ 36 | 15. Introversion......... 4 
Griffith Judd LaRue 
1. Psychology........... 120 1. Psychology......... 141 1. Propensity.......... 69 
2. Trait... reef B, GR. « dew scice 69 es Soi ces cedars 45 
3. Emotional............ 85 ) > es 68 3. Psychology.......... 36 
EC Cvcctedtecsccee. Sm i. <c sete aoe 49 DE ches sccéaccees 28 
ee 50 5. Stimulation......... 44 RL Ss a cdatecs 26 
eS 6. Mathematical....... 30 6. Motivation.......... 25 
PRT + 065% 806.608 oe 30 7. Generalization...... 28 Drei ceneoceutaaes 20 
es ccnscceeecta) Se GS . odeidddwe! ee Bi Ee cccccccee & 
9. Maturation........... 24 9. Perceptual.......... 20 i, Cs oc esctege 16 
10. Environmental........ 24 | 10. Articulation........ 15 | 10. Emotional........... 15 
SRR eeccccccensvccs «BR RRR III do cen ois. RE Ry adore cdecces 12 
Se 23 | 12. Handwriting........ 14 | 12. Maturation.......... 10 
13. Meaningful........... 23 | 13. Analytical.......... 13 | 13. Creativeness......... 10 
ck oa 6 ek k 20 | 14. Dissociation........ 13 | 14. Objective............ 9 
i BRS Sc vc ctedwes BD F BB Wes caccccceces 12 | 15. Environmental....... Q 
Mursell Sandiford 

1. Psychology.......... 129 1.\Psychology.......... 134 

2. Interaction.......... 79 2. Physiological........ 57 

3. Differentiation....... 28 i SLT So da eskeccees 51 

Gs CE, coiscicecaae ER 4. Emotional........... 51 

5. Emergence.......... 26 i nan i éduccndas 34 

6. Integration.......... 23 6. Correlation.......... 34 

7. Emotional........... 22 7. Chromosome......... 31 

Par 22 SO 28 

DRL oh Géwdke sa ce 22 9. Variability.......... 28 

i a coe ccc ht (SR Bi Ge Bees concdee 27 

11. Standardization...... 20 | 11. Intrespection........ 24 

T, ncccccbccccce| BOT EEccccecepsecss OS 

Se ee 17 | 13. Environmental....... 19 

14. Mathematical........ 15 | 14. Maturation.......... 18 

SR, Weve ccoducccuce 14 | 15. Objective............ 18 





(The only word common to all eight lists is psychology) 
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fundament, and reminiscence used by Commins; audiometer used by 
Douglas and Holland; emote and adience used by Griffith; decibel, 
organismic, schizoid, and diencephalon used by Sandiford; irrational 
and dissociation used by Judd. 

There were a large number of words which were found frequently in 
at least two of the books but not in any of the other six. Some of 
these are: behavioristic, belongingness, compensatory, gradient, gustatory, 
individuation, lobe, negativism, paramecium, phasic, phobia, recapitula- 
tion, senility, tactual, and zygote. 

The twenty-one words in Table I which are used at least once by all 
the authors might be said to represent a sort of common technical 
vocabulary for educational psychology. These words are basic, con- 
cept, correlation, emotional, generalization, inhibition, integration, 
logically, physiological, psychology, reliability, retention, sensory, simi- 
larity, standardization, trait, typical, environmental, maturation, objective, 
and stimulation. They make a very small common core of psycho- 
logical words. Even the frequency of use of these words varies tre- 
mendously from author to author. 

The variety of terms and concepts in educational psychology is 
probably even greater than shown in this study, where only eight 
recent books were analyzed. Older textbooks in this field would no 
doubt reveal other distinctive vocabularies. In some education 
departments, Lewinian or topological psychology forms the basic 
material of the educational psychology course. If Lewin’s books are 
used as texts, the student must master such concepts as valence, vector, 


barrier, phenotype, genotype, etc. 


SUMMARY AND CONCLUSIONS 


The findings of this study very clearly show that educational 
psychologists in their textbooks differ widely in the terminology which 
they employ. As a result of this general situation, students finishing 
courses in educational psychology using different textbooks often speak 
very different educational and psychological languages. A student 
finishing a course using Commins as a text, for example, would most 
certainly be expected to know the meaning of such expressions as the 
eduction of relations, the eduction of correlates, and the meaning of the 
term fundament. Students using as texts any of the other seven books 
examined in this study would not be expected to know or use these 
expressions, but would instead have other special words and phrases 
for which they would be held accountable. 





The Vocabulary of Educational Psychology 371 


A critic teacher who may later guide the practice-teaching of such 
students or a school principal or superintendent who may hire them 
cannot assume that they are in possession of certain common psy- 
chological concepts and terminology. This, of course, results in 
considerable misunderstanding and confusion. 

As psychology and education attain greater age as sciences and 
as we discover more about human behavior, there will probably 
be a tendency for the concepts and terminology to become better 
standardized. 

In the meantime it would appear that writers of introductory 
works in educational psychology might assist this process by foregoing 
the urge to coin new words to express old ideas, and by coming to 
some agreement as to what should be the basic content of educational 


psychology. 
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RELIABILITY OF MULTIPLE-CHOICE MEASURING 
INSTRUMENTS AS A FUNCTION OF THE 
SPEARMAN-BROWN PROPHECY FORMULA, IV* 


H. H. REMMERS 
Purdue University 
AND 
J. MILTON HOUSE 
Michigan City, Indiana, High School 


This study was conducted to determine if the change in reliability 
in multiple-choice tests, as related to the number of alternative choices 
per test item, is a function of the Spearman-Brown Prophecy Formula. 

The first study by Remmers, Karslake, and Gage* examined this 
hypothesis in terms of published coefficients obtained for a variety of 
achievement tests. Their results did not show substantially for or 
against the hypothesis, but they did favor it. Possible reasons 
advanced for the inconclusive results are: Computational errors, scor- 
ing variation, faulty procedure in elimination of alternative responses, 
faulty administration of test. 

The second was an experimental study by Denny and Remmers‘ 
that attempted to control these variables. Four forms of a multiple- 
choice vocabulary test were constructed using the same word list and 
were similar except that the number of responses to each word varied 
from five in Form A to twoin Form D. Approximately one thousand 
senior-high-school pupils were divided into four groups equated as to 
mean [Q and standard deviations and each group was given one form of 
the test. After computing the self-correlation for each form of the test 
(split-test procedure), it was found that the reliability increased as the 
number of responses increased, and that this increased reliability could 
be reliably predicted by the Spearman-Brown formula. 

The third study by Remmers and Ewart‘ represents an examination 
of the validity of this hypothesis as applied to the Harper Social Study 
Scale. Four experimental forms of “‘A Social Study” by Manly 





* A. K. Smith, principal of the Isaac C. Elston Junior High School, Michigan 
City, Indiana, extended fullest codperation and all facilities of his building so that 
this study might be carried out. 

Appreciation is expressed to Wendell R. Godwin, Superintendent of the 
LaPorte, Indiana, Junior High School, for furnishing students with which to con- 
duct a preliminary survey of this test to determine its validity. 
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H. Harper were used with Form 1 having a two-point scale and pro- 
gressing to Form 4 with a seven-point scale. The test was adminis- 
tered to eight hundred eight students in beginning psychology courses, 
four hundred eighty-seven at Purdue University, and three hundred 
twenty-one at the University of Nebraska. Examination of the data 
shows that in this experiment the reliability of the attitude scale is 
increased as the number of possible responses to each item is increased 
up to five responses. Furthermore, this reliability can be predicted, 
within the allowable error, by the Spearman-Brown Prophecy Formula. 
This study gives some indication that as the number of responses 
increase beyond five, our hypothesis breaks down, and the reliability 
does not increase according to the Spearman-Brown Formula if at all. 

In the present study Form 5 has five responses, Form 4 has four 
responses, Form 3 has three responses, and Form 2 has two responses 
and, using Form 2 as unit length, Form 3 is 94 units in length, Form 4 
is 44 units (or two units) in length, and Form 5 is 5% units in length. 
These respective values are substituted in the Spearman-Brown 
Formula. 
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7 l “> (n _ l)rs 





TA 


where r, equal reliability of A, where rg equal reliability of B, and 


number of response alternatives in each item of A 
number of response alternatives in each item of B 
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Analogous procedure is used in predicting reliabilities using, succes- 
sively, Form 3, Form 4, and Form 5 as unit length or base. 

A comparison is then made of these predicted r’s and the r’s ob- 
tained by the self-correlation of the ‘‘odd-even”’ items. 

A sixty-item multiple-choice arithmetic test was constructed, each 
item having five responses, only one of which is correct but all having a 
possibility of correctness to those least informed. This test, to be 
given to pupils of seventh, eighth, and ninth grades, had to be made 
so that no item was completely beyond the ability of the pupils just 
completing their seventh year and that no item was so easy for ninth- 
year pupils as to be correctly answered by all of them. 

Using Form 5 as a base, three derivatives were constructed by 
successive chance elimination of wrong responses from each item. 
These derivatives, Form 4, Form 3, Form 2, have the same items as 
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Form 5 but with a decreasing number of responses as indicated; there- 
fore, a corresponding decrease in the length of the test. 

In the chance elimination of responses, five opaque bottles were 
used, each bottle having in it four a each ball with the letter A, B, 
C, D, or E printed on it. 

One of the five bottles had no ball with “A,” one had no ball with 
“‘B,” etc. Now suppose in Form 5 (whose responses are lettered A, B, 
C, D, and E£) response ‘‘ B”’ is the correct one for item 1. We pick up 
the bottle which contains no “ B ball” and shake out a ball. If the 
““C ball” rolls out, we eliminate the C response from the item, thus 
creating item one of four responses for Form 4. We then shake out 
another, and another to create the corresponding item one for Form 3 
and Form 2. Each of the sixty items is treated in like manner and the 
responses are relettered A, B, etc., in all of the derived forms. 

This test was to be given to pupils in the seventh, eighth, and ninth 
grades; hence, we must be sure that the items are of appropriate 
difficulty. No item must be beyond all pupils finishing the seventh 
grade, and no item should be so easy that all ninth-grade pupils would 
succeed on it. ‘To obviate this possibility, Form 5 was given to fifty 
students of seventh-grade level and fifty students of ninth-grade level, 
in the LaPorte, Indiana, Junior High School. An item analysis of 
these results indicated that all items satisfied the conditions desired. 

The four forms, identical in items but differing in the number of 
responses per item, that is differing in length, were given to four groups 
making up the entire Junior High School of Michigan City, Indiana. 
These groups contained an equal number of seventh-, eighth-, and 
ninth-grade pupils, and were equated as to IQ means and standard 
deviations. The tests were given all at the same time by teachers who 
were instructed as to the technique which should be used. 

Table I shows the mean IQ, SD of the IQ’s, number tested, and form 


taken, of each group. 


TaBLeE ].—AveraGeE IQ, SD, anp Numser Testep or Each oF THE Four 
GROUPS 





Average IQ SD Number tested 





Group 5 (took Form 5)............... 102.40 13.20 176 
Group 4 (took Form 4)............... 101.73 13.15 198 
Group 3 (took Form 3)............... 101.88 13.20 202 








Group 2 (took Form 2)............... 102.53 13.30 195 
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TaBLE II].—RELIABILITY OF THE HaLF Test AND WHOLE Test ror Eacu Form 








Form 5 | Form 4 | Form 3 | Form 2 
Reliability coefficient of half test......... .810 . 699 .649 . 543 
Reliability coefficient of whole test........ .895 . 823 . 787 . 704 














Each form of the test was given two scores. 


One score for the odd- 


numbered questions; viz., 1, 3, 5, etc., the other score for the even- 
After these scores had been 


numbered questions; viz., 2, 4, 6, etc. 


corrected for guess in the formula 
















































































‘ Wrong 
Score = Right — ——= 
Rig ra 
they were correlated to determine the reliability coefficient of the half 
test. 
TaBLe III.—CompaRISON OF OBTAINED AND PREDICTED RELIABILITIES 
Tobt. | Tpred. | Zobt. | TZobt. | Zpred. | Zaitt. | care. | CR 
Reliabilities as Predicted from Form 2 
Form 3 .649 | .641 | .773 | .071 | .760 | .013 | .100 .13 
Form 4..............| .699 | .703 | .865 | .071 | .873 | .008/ .110 .01 
as a hokeee .810 | .748 |1.127 | .076 | .968 | .159 | .175 91 
Reliabilities as Predicted from Form 3 
wees oo ewe .543 | .522 | .608 | .072 | .621 | .013/ .121 11 
Form 4 ot aes g .699 | .711 | .865 | .071 | .889 | .024/ .110 ll 
Sn, cacseeaeed .810 | .756 |1.127 | .076 | .986 | .141 | .175 81 
Reliabilities as Predicted from Form 4 
i ce eae .543 | .537 | .608 | .072 | .600; .008 | .121 .01 
I .649 | .635 | .773 | .071 | .750 | .023 | .100 .23 
I is ces Gaceun .810 | .743 |1.127 | .076 | .957 | .170 | .175 .97 
Reliabilities as Predicted from Form 5 
ook ae .543 | .611 | .608 | .072 | .711 | .103 | .121 .85 
| TE .649 | .718 | .773 | .071 | .904/{ .131 | .100/ 1.31 
a IR pa .699 | .773 | .865 | .071 |1.028 | .163 | .110 | 1.48 
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Table II shows the reliability of the half test and of the whole test 
when estimated by the Spearman-Brown formula for each form. 

Using any form as a base, as previously explained, the Spearman- 
Brown Formula is used in predicting the reliabilities of the other forms. 
Table III shows a comparison of the obtained and predicted reliabilities 
in terms of Fisher’s z transformation.’ 


SUMMARY AND CONCLUSION 


A controlled experiment using seven hundred seventy-one junior- 
high-school pupils was designed to test the hypothesis that increase in 
reliability of multiple-choice arithmetic test items with increase in the 
number of response alternatives per test item is predicted by the 
Spearman-Brown Formula. For such test items varying in number of 
responses from two to five the experimental data completely support 


the hypothesis. 
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INTELLIGENCE OF HIGH-SCHOOL SENIORS 
IN HAWAII 


T. M. LIVESAY 


University of Hawaii 


The following material represents a survey of the test intelligence 
of a majority of the high-school seniors in Hawaii for the year 1936. 
The 1935 edition of the American Council Psychological Examination 
was given in the Spring of that year to twenty-two hundred fifty-five 
students—twelve hundred sixty-four males and nine hundred ninety- 
one females—all but one hundred sixty-three of the total number of 
seniors in territorial high schools. 

Table I gives the distribution of total scores and percentage at each 
interval for the males, females and the total group, with the means, 
25th percentiles, medians, 75th percentiles, and standard deviations 
for the three distributions. 

It is apparent that the distributions approximate each other very 
closely. The distribution of cases by intervals, and percentage at each 
interval are quite similar. There are two hundred seventy-three more 
males than females despite the fact that census figures show the sexes 
to be approximately equal in each age group. 

It is noticeable also that the measures of central tendency and 
dispersion are very close for all three distributions. The median score 
for the females is slightly below that of the males, and the standard 
deviation slightly smaller than that of the males. 

The question as to whether these sex differences are significant or 
not may be answered by three methods of comparison: (1) The over- 
lapping of scores or per cent of one group which reaches or exceeds the 


median of the other; (2) the Coefficient of Variation (v = <4" and 





Oo diff. 


(3) the Critical Ratio (==) The respective results of these three 


methods are as follows: (1) Forty-eight per cent of the females reach or 
exceed the median score of the males; (2) the females are ninety-eight 
per cent as variable as the males; and (3) a Critical Ratio of .96 indi- 
cates only eighty-three chances in one hundred that the difference is 
greater than zero, or significant. 

While the two sexes show approximate equivalence of performance 


on the examination as a whole, quite the opposite is true when they 
377 
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are compared on the separate tests comprising the examination. 
Tables II and III give the measures necessary for such a comparison. 
In Table II are given the means, the standard deviation of the dis- 
tributions, and the standard errors of the means for both sexes on each 
of the five tests; and in Table III the differences between means, the 
standard errors of these differences, the Critical Ratios, the chances in 
one hundred of a true difference greater than zero, and the sex favored 


in each comparison. 


TABLE ].—DIsTRIBUTION OF ToTAL ScORES AND PERCENTAGE AT Eacu INTERVAL 
FoR MALEs, FEMALES AND ToTaL PoPULATION 


























Male Female Total 
Scores Fre- Per Fre- Per Fre- Per 
quency | cent quency cent quency cent 
oe ia iat alii 3 << 1 1 4 a 
SS ee 4 3 + 4 8 .3 
EL. . cca deat twee 7 5 3 3 10 4 
ROS ss waren ade ¥ ee 10 8 5 5 15 7 
DAL 6% én ak Sales 14 1.1 8 8] 22 1.0 
ER > 22 1.7} 24 2.4 46 2.0 
a Pare eer 43 3.4 21 2.1 64 2.8 
RENE STS ENS A 2 35 2.8) 37 3.7 72 3.2 
ELL, £2 x areca gm gis Baw a 63 5.0 | 45 4.5 | 108 4.8 
ER a 5 cbs ck beoee ee 91 7.2) 67 6.8 | 158 7.0 
i eae Ls wa oie OHeU 109 8.6 73 7.4 | 182 8.1 
ee 118 9.3; 85 8.6 | 203 9.0 
Ee ee ee 128 10.1 | 108 10.9 | 236 10.5 
SEES Ra RIES 143 11.3 | 125 12.6 | 268 11.9 
co 6 cine dase 4 Ree 138 11.0 | 137 13.8 | 275 12.2 
TE Se 150 12.0 | 108 10.9 | 258 11.4 
0 ES eee eee 107 8.5 91 9.2 | 198 8.8 
is ig ices eae ag 65 5.1 33 3.4 98 4.3 
See oe 11 9 13 1.3 24 3 
I i aie gs aig bia ale 3 2 3 .3 6 .3 
a i ee 1264 100.0 | 991 100.0 (2255 100.0 
MIN co ac nikita ee A ae . SO Bf ee 122.17 
Mees ia nce cts ovecandteet 83.01] .....  *< 2 ayeee 83.82 
oe oes i aaa <3 ae fs Sf ae 115.32 
em errr 156.70 | .....  < re 155.32 
SD distribution........... 51.885) ..... 49.830) ..... 51.000 
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It is evident from column four of Table III that the differences are 
all significant, 7.e., the chances are one hundred in one hundred of a 
true difference greater than zero. The differences favor the males in 
Completion, Arithmetic, and Analogies and the females in Artificial 
Language and Opposites. 

The rather low measures of central tendency on the examination as 
a whole, and on the separate tests, would suggest that the quality of 
high-school students in Hawaii is somewhat inferior. However, two 
things should be kept in mind: (1) This group comprises all territorial 
high-school seniors (with the exception of the few absent on the day 
the test was given); and (2) many of these students are definitely 
handicapped in the use of English. 


TasB_e II.—Means, STANDARD DEVIATIONS OF THE DISTRIBUTIONS, AND STANDARD 
ERRORS OF THE MEANS OF THE MALES AND FEMALES 




















Test Measure Male | Female 
EOC RET DSS PR. Can ae By COR ae 21.41 20.19 
Completion i soc bsotivustésiveeuhSs cay 10.360 | 10.665 
OT a, Bes Sa ee ek . 292 . 339 
RR a i i ee 18.91 12.08 
Arithmetic a i nad beso one adh ahd 13.045 9.255 
Ts) ona 4 wholes oblee sas tedslibee ath . 367 . 204 
IEE OS a rae a Rn ee 30.35 35.27 
Artificial language | o distribution.....................005-. 13.310 | 14.570 
eee eee. eee. ke ree .3875 .463 
a Cre a er ok ds paw e na an 22.59 20.01 
Analogies AK SNUG <0 000% cre webwe se 12.980 | 12.645 
ae edccescevccccess . 365 401 
a as ee Nt ae ae 29.77 32.68 
Opposites ih s a6 hue pb cbaedsaos 9.290 | 14.930 
UE, 6a dec dled csakus ahi teok'scite wei’ .261 .474 














Norms available for the examination are based upon the per- 
formance of a select group—freshmen admitted to college. A com- 
parison of these test norms with corresponding measures for the group 
admitted to college from the above high-school seniors is given in 
Table IV. While there is still some discrepancy, the results here are 
quite different from those of Table I. 
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TaB_e III.—Sex ComMpaRIsons IN TERMS OF THE CHANCES OF A TRUE DIFFERENCE 











Chances in 
Difference Critical 100 that Difference 
: the true : 
Tests between Caiif. ratio ‘ in favor 
means D/oaiat difference of 
’ | 1s greater 
than zero 
Completion............. 1.22 .447 2.73 100 Male 
EE ees 6.83 .470 14.53 100 Male 
Artificial language....... 4.92 . 596 8.26 100 Female 
din 0000s6ns oe 2.58 .542 4.76 100 Male 
CR én bs conebes 2.91 541 5.38 100 Female 

















Some of the above differences in comparative scores could probably 
be attributed to a definite English handicap on the part of a good many 
children in Hawaii. There is not enough space here to cite evidence, 
but many studies in print amply support this supposition. 


Taste IV.—ComPaARISON oF Hawa Scores witH Trest Norms 
| 








Measures Test norms! | Hawaii scores 
i OR on ea en alas kW ike} o 9188 143.12 135.75 
RE Sg AEN Eo Oe) 5 ee a a 183.55 167.05 
eee ee ian aet obs oun 223.43 202.05 











1 The Educational Record, Vol. xvi, p. 309. 


SUMMARY AND CONCLUSIONS 


(1) This study presents data on the test intelligence of twenty-two 
hundred fifty-five high-school seniors in Hawaii, with analyses of sex 
differences on total and subtest scores and suggestions as to a language 
handicap. 

(2) The measures of central tendency are slightly higher for the 
males but the differences are not significant. 

(3) There are definjtely significant differences on subtest scores— 
the males showing superiority on the Completion, Arithmetic, and 
Analogies tests, and the females on the Artificial Language and Oppo- 


sites tests. 











A SIMPLE INDEX OF TEST RELIABILITY 


GUSTAV J. FROEHLICH 
University of Chicago 


THE PROBLEM 


The intelligent user of educational tests is interested in knowing 
the degree of consistency with which these instruments measure. 
He wants some numerical index of their reliability. In the case of a 
standardized test, a coefficient of reliability is ordinarily reported in the 
test manual. There are available for reference, too, a number of 
summaries and reviews of standardized tests in which reliability data 
are given. Perhaps the most complete recent publication of this kind 
is Buros’! The Nineteen Forty Mental Measurements Yearbook, which 
includes highly critical reviews of five hundred twenty-four standard- 
ized tests published through September, 1940. 

Standardized tests, however, constitute only a small portion of the 
classroom teacher’s testing program. Most monthly, weekly, and 
daily quizzes are informal and teacher-constructed. For these, of 
course, there are no accompanying manuals, nor are they critically 
reviewed. Yet it is desirable to have some index of the reliability 
of such tests, for they yield a large share of the data upon which marks 
and promotions are based. It, therefore, becomes the teacher’s 
responsibility to determine the reliability, or consistency, with which 
the quizzes he constructs measure. 

In many cases the classroom teacher does not have the time to 
collect the additional data needed to compute the customary coefficient 
of reliability—.e., rescore the quizzes by odd and even items, or 
construct and administer duplicate forms, or re-administer the original 
form. Neither do many teachers haye the necessary statistical back- 
ground to compute a Pearsonian r or to handle the Spearman-Brown 
prophecy formula. Consequently, these teachers could use an index 
of test reliability which is obtained with a minimum amount of effort, 
and which, at the same time, approximates in form and interpretation 
the customary coefficient of reliability. 

A few years ago Kuder and Richardson? developed an index of test 
reliability which, as they showed, approximates the coefficient of 





' Buros, Oscar Krisen (Editor): The Nineteen Forty Mental Measurements Year- 
book. Highland Park, N. J.: The Mental Measurements Yearbook, 1941, pp. 674. 
? Kuder, G. F. and Richardson, M. W.: “‘The Theory of the Estimation of Test 
Reliability.”” Psychometrika, Vol. u, No. 3, 1937, pp. 151-160. 
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reliability as determined by the split-halves technique when used in 
conjunction with the Spearman-Brown formula, but which does not 
involve the computation of a Pearsonianr. Thus the Kuder-Richard- 
son index requires no odd-even rescoring, no second administration of 
the test, and no duplicate form to be constructed and administered. 
This index is designed for use with an electric test scoring machine— 
a piece of apparatus generally not available to the classroom teacher. 
It is the purpose of this paper to show that a simple adaptation of the 
Kuder-Richardson formula—which involves only (a) the number of 
items in the test, (b) the mean of the test scores, and (c) their standard 
deviation—will yield sufficiently accurate results to warrant its use 
by classroom teachers. 


THE KUDER-RICHARDSON FORMULA 


One form of the Kuder-Richardson formula is given as! 


n ; (co? — npq) 
(n — 1) a? 





r= (Formula 1) 
where ‘“‘n”’ is the number of items in the test, ‘‘o”’ is the standard 
deviation of the distribution of obtained test scores, and ‘‘pq”’ is the 
average variance of the test items. If the assumption is now made 
that all of the test items have the same difficulty—that is, the same 
percentage of testees (not necessarily the same individuals) miss each 
item—then ‘‘pg”’ is equal to ‘‘p” multiplied by “‘g.”’ Also in terms 
of the assumption just made, ‘j”’ is equal to ‘‘M/n,”’ where “M” 
is the mean and ‘‘n,” as before, is the number of items in the test. 
Furthermore, since ‘‘g’’ by definition is equal to ‘‘(1 — j),”’ then in 
terms of the above assumption, ‘‘g”’ is equal to ‘(1 — M/n).” Thus 
Formula 1 may be written as 


n o? — n(M/n)(1 — M/n) 


= eg ° 72 (Formula 2) 





r 


which by a rearrangement of terms becomes 


. _ gn — M(n — M) 


Wa — 1) (Formula 3) 





Formula 3 involves no other terms but (a) the number of items in the 





1 Op. cit., Formula (20), page 158. 
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test, (b) the mean of the test scores, and (c) their standard deviation." 
Even though Formula 3 is based on the assumption that all of the 
test items have the same difficulty, it is also true that this formula will 
yield sufficiently approximate indices when the individual test items 
include a wide range of difficulty—+.e., as much as 80 points on a 100- 
point scale. This is shown empirically below. 


AN APPLICATION OF FORMULA 3 


The Wisconsin Achievement Test? was administered to some two 
thousand individuals. By means of the split-halves technique and 
the Spearman-Brown prophecy formula, the coefficient of reliability 
was calculated for the five subtests and for the total test. A corre- 
sponding series of indices of test reliability was then computed by 
using the adaptation of the Kuder-Richardson formula (Formula 3) 
developed above. The two sets of reliability coefficients are compared 
in Table I. 

Table I shows that the coefficients of reliability obtaiied by using 
Formula 3—the adaptation of the Kuder-Richardson formula— 
approximate quite markedly the indices of reliability obtained when 
the split-halves technique is used in conjunction with the Spearman- 
Brown prophecy formula. In general, the difference between the two 
indices tends to decrease as the reliability increases. In no instance 
is the difference greater than .058. The rank order of the Kuder- 
Richardson indices correlates perfectly with that of the Spearman- 
Brown indices. In other words, both indices agree in designating the 





1 While the present paper was in preparation Paul L. Dressel (Psychometrika, 
Vol. v, No. 4, 1940, pp. 305-310, and C. J. Hoyt (Educational and Psychological 
Measurement, Vol. 1, No. 1, 1941, pp. 93-95) published other variants of the Kuder- 
Richardson formula. The Dressel article derives the Kuder-Richardson formula 
in @ manner independent of the original Kuder-Richardson discussion, and then 
suggests a simplification for use with a calculating machine. The Hoyt article 
develops a formula similar to one suggested by Dressel, but its practicability is 
limited, especially for long tests, when no electric test scoring machine is available. 
Neither Dressel nor Hoyt present empirical data of the kind given in the present 
paper. 

* This Wisconsin Achievement Test is an achievement battery covering Language 
Usage, American History, General Science, Geometry, and Algebra. Its items 
are taken from the files of the Wisconsin Achievement Testing Program. The test 
was arranged and standardized by the author of the present paper for use in con- 
nection with a study that he made of the 1939-1940 freshman class at the Univer- 
sity of Wisconsin. 
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most reliable subtest, the next most reliable, and so on to the least 
reliable subtest. 
TaBLE I.—SpPEARMAN-BROWN vs. KupER-RICHARDSON COEFFICIENTS OF 


RELIABILITY FOR The Wisconsin Achievement Test 
(These Data Are Based on 2151 Cases) 














Spearman-| Kuder- 
Brown  =/Richardson | Differ- 

ence 

Test n|o@ M Rank Rank! in 

r! |order| r | order| 7's 

of r’s of r’s 

Language usage...............} 5410.50} 35.53).934) 4 |.907) 4 .027 
American history............. 40} 6.05) 25.71).826} 6 |.768) 6 .058 
General science...............| 66/11.63) 38.66}.932)} 5 |.895) 5 .037 
I. atin id's awed meued 24; 7.79) 10.52). 966 2 |.942 2 .024 
Algebra......................| 40/10. 26} 20.82).951 3 |.928 3 .023 
Complete battery............. 224/33 .45/131.14/.973} 1 |.956) 1 .017 





























1 The Probable Errors of the Spearman-Brown reliability coefficients vary from 
.0015 for the Complete Battery to .0086 for the American History subtest. The 
Probable Errors vary inversely as the magnitude of the reliability coefficients. 


It should be pointed out that all of the obtained Kuder-Richardson 
coefficients are smaller than their respective Spearman-Brown coeffi- 
cients. As a matter of fact, the Kuder-Richardson formula as here 
used will always underestimate the reliability to the extent that the 
difficulty of all of the test items tends to differ from item to item. 
Only when all items are of the same difficulty does the Kuder-Richard- 
son r become an exact index. The Kuder-Richardson r’s in Table I 
are, therefore, to be considered as conservative estimates of reliability; 


TaBLeE II.—A Summary or ITEM DirFicu.ty In The Wisconsin Achievement Test' 








Sigma of 
Test ae nt ; difficulty 
~ | distribution 

EEE OT EN re re 75 18 
nr, CU laa oe GE Wibod ce's's dc s's ceed ctiee 71 16 
I a a> ae ee 77 21 
PE RTA, Bsc gO SUR “oe ea a 62 15 
TE 5d de GaGa CSCS UR CU eh Cee weicbeche ctewe 74 19 
IS Ja Fister On cs vos veo sc'c vdieuues 83 20 











1In terms of a 100-point percentage scale. 
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for—as shown in Table I]—there is a great variation in item difficulty 
in The Wisconsin Achievement Test. For the item difficulty assump- 
tion to be met perfectly, all entries in Table II would have to be zero. 

Thus, even though the Kuder-Richardson r, as computed in this 
study, will always tend to be conservative—for it would be mere 
coincidence for an informal quiz to contain items of equal difficulty— 
such underestimation, as shown in Table I, tends to be so small that 
the practical use of Formula 3 is recommended for use by classroom 


teachers. 
CONCLUSIONS 
(1) The index of test reliability obtained from the formula, 


— o*n — M(n oc M) 


a*(n — 1) 





where ‘‘n”’ is the number of items in the test, ‘‘M”’ is the mean of the 
test scores, and ‘‘c”’ is the standard deviation of the test scores, is 
approximately equal to the corresponding index obtained when the 
split-halves technique is used in conjunction with the Spearman- 
Brown prophecy formula. 

(2) The coefficient of test reliability obtained when the above 
formula is used will always underestimate the true reliability to the 
extent that the difficulties of all of the test items tend to deviate from 
one another. 

(3) Such underestimation is not serious, however. The greatest 
difference between Kuder-Richardson r’s (computed by means of the 
simple formula developed in this paper) and Spearman-Brown r’s was 
found to be less than .06 for the data considered; whereas, the item 
difficulty range for that instance was 71—on a 100-point percentage 
scale—and the standard deviation of the difficulty distribution of 
items was 16. 

(4) The rank order of the reliability indices obtained by using the 
formula developed in this paper correlates perfectly with the rank 
order of the corresponding series of Spearman-Brown indices. 

(5) Length of the test and qualities inherent in the test as a whole 
affect the Kuder-Richardson index of reliability in approximately 
the same manner as they affect the Spearman-Brown index. 

(6) Because the adaptation of the Kuder-Richardson formula, as 
used in this paper, yields satisfactory results, and because it involves 
& minimum of time and statistical background, it is reeommended for 
use by busy classroom teachers. 


* 





5 eee 8) ery 
TS FR ig! 2 epee 


F 
& 
‘ 





FURTHER ANALYSIS OF THE RESULTS OF SPEED 
DRILLS WITH THE METRON-O-SCOPE 


RAY H. SIMPSON 
College of Education, University of Alabama 


In making a survey of experimental evidence for and against the use 
of the Metron-O-Scope the writer came across some evidence by Garver 
and Matthews! which would seem to justify a further analysis and 
somewhat different conclusions from those given by the authors. 

Garver and Matthews conducted an experiment which involved: 
(1) Administration of Form A of the Iowa Silent Reading Tests to 
members of the slowest sections of the seventh, eighth, and ninth grades 
of the Coatesville, Pennsylvania, Junior High School; (2) Training in 
reading by means of the Metron-O-Scope for a period of from twelve to 
seventeen minutes each Monday and Thursday for ten weeks; (3) 
Administration of Form B of the Iowa Silent Reading Tests. (Tests on 
the Ophthalm-O-Graph were also made but are not of concern here.) 

Two statements of results are the following: 

“There was also a more than normal average development in terms 
of total comprehension score as determined by the Iowa Silent Reading 
Tests.”’ (p. 695) 

“‘Kach class section also made normal advancement in average total 
score in comprehension on the Iowa Silent Reading Tests.” (p. 696) 

The fact that no controls at all were used for forty-seven of the 
sixty-seven subjects in the experiment may account for the uncertainty 
with respect to whether ‘‘normal”’ or “more than normal”’ progress was 
made. 

At one point the authors say that there were ‘twenty-four members 
in the ninth-grade class,’ (the lowest section of the ninth grade), 
(p. 695), and on the next page (p. 696) they say that to determine the 
effects on comprehension from a ten weeks’ series of intensive drills 
whose sole (?) purpose was to increase the rate of reading, another 
ninth-grade section was used as a control group for the ninth-grade 
section used in the experiment. The control group was the next to the 
slowest section of the seven sections into which the ninth grade was 
classified. There were thirty pupils each in these two lowest sections 





1 Garver, F. M. and Matthews, R. D.: “‘An Analysis of the Results of Speed 
Drills with the Metron-O-Scope to Increase Reading Rate.” Journal of Educa- 
tional Psychology, Vol. xxx, 1939, pp. 693-698. 
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of the ninth grade.’’ Were there thirty members of the ninth-grade 
class or were there only four-fifths of thirty or twenty four? The 
reader cannot tell since he is given both numbers and obviously both 
cannot be correct. 

It finally develops that (p. 696) there were six pairs of girls and 
fourteen pairs of boys in the experiment. The ranges in ages, intelli- 
gence quotients, and reading grade achievements according to school 
records were: 














School-grade levels 
CA IQ 
Compre- Rate 
hension 
Control group.............. 15-4 to 17-10 | 64-112 | 6.2to 8.9] 3.0to 8.5 
Experimental group......... 14-1 to 18-8 | 61-108 | 5.2 to 9.2 | 3.0 to 12.0 

















The number of cases is small (twenty for each group). Although 
matching is claimed, the matching is very doubtful since the lowest CA 
in the control group is 15-4 and the youngest individual in the experi- 
mental group was fifteen months younger. The poorest comprehen- 
sion level in the one group is 6.2 and in the other it is 5.2. Are these 
two individuals matched for experimental purposes? The highest rate 
in the one group is 8.5; the highest in the other is 12.0—a difference of 
three and one-half grades. Are these two individuals matched in a 
meaningful way? 

Finally, some questions may well be raised concerning the inter- 
pretation which has been made of the results. The results given for 
this part of the experiment are: 





Comprehension score on 


Rate, words per minute 
. P —_ Iowa Test 





Initial | Final | Difference} Initial | Final | Difference 





Control group......... 244 263 19 108 115 
Experimental group....| 205 295 90 94 104 10 
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First, were the groups matched in any real sense so that differences 
would be meaningful? 

Second, did the authors give sufficient attention to the fact that the 
part of the Iowa Tests which is called the Rate Test is only a two- 
minute test while the other five lowa Tests are also rate tests in that 
they are timed reading tests and speed of reading is a major factor in 
success with them? Would not the thirty-three minutes of timed 
reading in the other five tests be of more significance than the two 
minutes of timed reading? Should not the results indicate that from 
a practical standpoint the training by means of the Metron-O-Scope is 
of doubtful value? 

Third, what other values did the control subjects achieve during the 
time that the experimental subjects were being trained in the use 
of the Metron-O-Scope? 

The writer believes that this article emphasizes the need for great 
care to be taken in (1) giving factual information about the subjects in 
any scientific experiment, (2) matching control and experimental 
groups, and (3) interpreting results of an experiment. 











NOTE ON LINE OF RELATION METHOD OF 
ESTABLISHING AGE OR GRADE NORMS! 


ROGER T. LENNON 


Division of Test Research, World Book Company 


It is customary in establishing age or grade norms for a test to find 
the mean scores for successive age or grade groups and pass a line 
through these points; thus the norm line is essentially the line of 
regression of score on age. An alternative method of establishing 
norms has been proposed and used in connection with certain tests; 
namely, the so-called “line of relation” or correspondence method of 
determining norms. According to this second method, norms which 
have been established for one test may be applied to a second test by 
determining the correspondence of scores on the two tests, and assign- 
ing to any given score on the second test the age or grade equivalent 
of the corresponding score on the first test. The correspondence of 
scores is determined by administering both tests to the same group of 
pupils and finding the scores in the two tests corresponding to selected 
percentiles or standard deviation points. This method has been exten- 
sively used in setting up norms for alternate forms of a test after an 
original form has been standardized. It has been suggested that the 
use of this method be extended to situations in which the two tests are 
not quite so comparable as are equivalent forms of the same test; e.g., 
in establishing norms for an achievement test by equating it to an 
already standardized intelligence test. 

It is the purpose of this paper to call attention to the condition 
under which the two methods of establishing norms—namely, the 
independent derivation method and the correspondence method— 
yield identical results. 

Let us consider two tests for both of which it is proposed to set up 
age norms. (The more usual situation is that in which one of the 
two tests has already been standardized, but the argument which 
follows applies in either case.) We shall designate these Test 1 and 
Test 2. 

The score on Test 1 for any given age may be found from the usual 
regression equation; namely, ¥ = b.%a, (assuming for simplicity, linear 
regression ) 





1 The author is indebted to Dr. Arthur S. Otis for the suggestion leading to the 
preparation of this note. 
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where Z; = score on Test 1 (expressed as deviation from its mean), 
bia = regression coefficient of score on age, 
and x, = age for which score is being estimated (expressed as a 
deviation from its mean). 


But bi. = re or simply r:, if the scores are all expressed in 


standard deviation units. Thus Z; = ric%. (expressing the measures as 
standard scores). 

Similarly, the score for any given age on Test 2 may be found from 
the equation Z2 = feaZa. 

Thus, according to this customary technique for establishing norms, 
the equivalent values on the two tests—that is, the values corresponding 
to the same age—would be raz. and reaza. (It is assumed that Test 2 
is being standardized on the same population as, or a population 
comparable to, that used in standardizing Test 1.) Only in the event 
that Tia = Toa will 21 = 2e. 

However, when norms are established by the line of relation or 
correspondence technique the equivalent values are by definition such 
that z; = Ze, since they are the scores corresponding to the same sigma 
values or to the same percentile points. 

It is, therefore, obvious that the two techniques yield identical 
results only in the case that ri. = ro, that is, in the case that the 
correlation of scores on each of the two tests with age is the same. 
Note that the magnitude of these correlations is irrelevant, as is 
likewise the correlation between the two instruments themselves, 
although this latter value will, of course, be affected by the respective 
correlations with age. 

The practical implication of the foregoing discussion is the necessity 
for examining this factor of correlation with age whenever it is proposed 
to use the line of relation technique; and, if the correlations of the two 
instruments with age are not equal, either to adhere to an independent 
derivation of norms, or to introduce the ratio of the two correlations 
as a correction factor in establishing the correspondences by the line 
of relation. 

We have in the foregoing discussion considered only the case of age 
norms. The same reasoning applies equally, mutatis mutandis, to 
grade norms. The applicability of the line of relation technique in the 
establishment of percentile norms, let us say, for a single age or single 
grade group, is not touched upon in this note. 








BOOK REVIEWS 


LYMAN Bryson. The New Prometheus. New York: The Macmillan 
Company, 1941, pp. 107. 


This thirteenth volume in the Kappa Delta Pi lecture series is a 
strong argument for an increase in the emphasis the public schools 
place upon teaching children to value and employ the scientific 
method. Bryson insists that we should not be blamed for having 
tried to teach men to think clearly and “self-lessly’”’; but we should 
be blamed because we have failed. He argues that this development 
of critical intellects is, or better should be, the school’s big business. 
The reviewer agrees; although he, as well as Bryson, realizes that 
pupil’s lives cannot be neatly divided into the mental, the physical, 
and the emotional. It is a question of emphasis. 

Bryson has little patience with the group that cries, ‘“‘Let’s stop 
thinking and do something.”’ He recognizes that this is an old slogan 
—that it is much easier to do the wrong thing than to think a way 
through to what is right (p. 65). The essay is concluded with the 
statement that it was not intended as a manifesto, but rather as a plea 
for man to regain his faith in reason. In the author’s words: 

“This new Promethean enterprise is more than just giving fire to 
men. Fire is dangerous unless men know how to use it. This enter- 
prise is to give men both power and the knowledge of its possible uses 
for their good. It is to help them from suffering at the hands of those 
who have knowledge and would use it against them. It is to give 
common ownership to effective thought and also to the knowledge of 
what there is in the world worth having, including freedom and how 
to keep it. The teacher is the friend who makes men free’ (pp. 
106-107). : STEPHEN M. Corey. 


University of Chicago. 


Luetta Coie. The Background for College Teaching. New York: 
Farrar & Rinehart, 1940, pp. 616. 


This ambitious book resulted from the conviction that the college 
teacher of today is confronted with many problems his predecessors 
did not have to face. He finds himself in the midst of the struggle 
“between the ‘new’ and the ‘old’ educational methods, . . . between 
the specialist and the general culturist, . . . between the educational 
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expert and the subject-matter expert” (p. vi). The author’s purpose 
is to present in organized fashion the big ideas derived from some six 
hundred references on American higher education. The result is a 
systematic discussion of: (1) Our present-day institutions of higher 
learning, (2) the college student, (3) the problems of collegiate class- 
work, (4) the social and economic aspects of college teaching, and 
(5) the evaluation of college instruction. 

The first one hundred eighteen pages describe the American college 
—its types, objectives, curricula, student population, and personnel 
work. The reviewer’s inference was that American higher education 
has little uniformity, that its objectives have changed from a complete 
concentration upon the academic to some emphasis upon the personal 
needs of students, that social pressure constantly influences its curric- 
ula, that college students continue to be a select group, and that the 
modern college has introduced personnel work “for the purpose of 
treating students as individuals”’ (p. 118). 

The second section of the book is a treatise on the educational 
psychology of the late adolescent years, that is, on the physical, 
mental, social, emotional, and moral adjustment and development of 
college students. In general, the major principles are well organized 
and helpful. Two items, in particular, arrested the attention of the 
reviewer: First, the dogmatic statement that “ . . . teachers should 
teach worth-while material, but they should not try to teach only what 
is interesting”’ (p. 233); and second, “‘ As far as I can find out, there is 
no experiment reporting the learning of even one student in a single 
college subject. No one knows at what rate or by what methods 
students master any field of knowledge. Such results as there are 
relate mostly to the acquisition of specific skills or to the development 
of specific abilities that form only a small part of an entire course” 
(pp. 272-273). 

The section on the problems of collegiate classwork will help the 
inexperienced college teacher who has had little training in pedagogy. 
It contains specific and well-illustrated discussions of class size and 
learning, teaching techniques, examinations, marking systems, and 
characteristics of inferior and superior college students. 

It is unfortunate that the author crowded her analysis of the social 
and economic aspects of college teaching into sixty-eight pages; for 
frequently the relationship of the college teacher to his superiors, his 
peers, and his students make him or break him. Despite her brevity, 
the author’s frank and open statements relative to administrative 


—_ Wes fee —— Abia 
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organization, professional activities, salary, promotion, and ethics 
reveal some of the obstacles to successful college teaching. 

After reading the last section of the book, the reviewer was left 
with the impression that the author suddenly realized that her manu- 
script was becoming too long, and hence ought to be finished up as 
rapidly as possible. The discussion of the evaluation of college 
instruction is very brief. Even though it is granted that most studies 
of teaching success have been carried out at levels below that of the 
college, it would seem, nevertheless, that the thirty odd references 
cited deserve more discussion than was given them. Of interest is the 
author’s conclusion that ‘‘the rating scale has come to stay. Before 
long it is likely to be as much of a fixture in American colleges as the 
objective test already is”’ (p. 593). 

As a “background for college teaching’’ the book is an excellent 
one. It enables the beginning college teacher to view the American 
college as a complete enterprise. It will appeal to professional students 
of education and to college personnel workers, but its very inclusive- 
ness may irritate the typical subject-matter specialist. Although the 
author professes to present facts rather than to interpret them, the 
“personnel” point of view is consistently favored. The student, 
rather than the subject or the teacher, should be the focal point in 
American higher education. 

The form, style, and make-up of the book are excellent. The 
numerous cleverly conceived pictograms go a long way toward remov- 
ing the monotony of what might easily have been a series of common- 
place statistical tables. The frequent excerpts from actual case 
studies humanize the treatise, and the chapter summaries add to the 
reference value of the book. Gustav J. FROELICH. 

The University of Chicago. 


JosepH JusTMAN. Theories of Secondary Education in the United 
States. New York: Bureau of Publications, Teachers College, 
Columbia University, 1940, pp. 481. 


That practice depends upon theory is axiomatic. This is as true 
in education as it is in medicine, engineering, industry, or any other 
form of human activity. However, the multiplicity of educational 
practices evident in even the most cursory survey of schools in the 
United States, makes one wonder if there is any basic educational 
theory. If one turns from examples of practice to the writings of 
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educational theorists—that is, frequently, those who are safely away 
from the front lines—there is still found an over-abundance of differ- 
ences. In fact anyone seriously interested in education as a part of 
our society becomes discouraged, if not confused, when he endeavors to 
read half a dozen books by as many authors all purporting to discuss 
the theoretical basis of modern education. Justman has undertaken 
to reduce to some order the confusing currents of educational theory, 
at least as they relate to secondary education, defined in terms of a 
time period, 7.e. adolescence, rather than in terms of educational 
technique. 

From study of the extensive writings on educational theory Just- 
man discerns four major “systems of cognate theory.’”’ These four 
are: Humanism, social evolutionism, social realism and experimen- 
talism. The authors included in each of these are—Humanism: 
Butler, Foerster, Hutchins, Kandel, and Learned: Social evolutionism; 
Bagley, Judd, and Morrison: Social realism; Briggs, H. R. Douglass, 
Spaulding, and Cox: Experimentalism; Hopkins, Thayer, Dewey, 
Kilpatrick, Bode, Childs, and Counts. There are, of course, differ- 
ences between the men in any one group, but it is the author’s purpose 
to study the differences among the four, rather than the variations 
within them. The social dynamics, psychological foundations, mean- 
ing, and methods of secondary education are discussed from the view- 
point of each of the four major systems. Within each of these, e.g. the 
discussion of social dynamics from the humanist point of view, genera! 
principles are first presented, followed by an evaluation of the present 
situation and then a statement of the nature of constructive proposals. 
In a final chapter the author, who personally favors the Social realists, 
presents a critical summary of the four systems. 

It is the purpose of this review to evaluate not the systems which 
the author discusses, but his success in doing so. In the first place 
Justman shows complete impartiality. While there may be occasional 
parenthetical comments, humanism is presented in as objective a 
fashion as is social realism, or experimentalism as social evolutionism. 
Furthermore, the presentation indicates that the author has grasped 
the varying points of view, has realized the communalities and has 
presented the viewpoints in his own lucid style. Quotations from the 
original do not intrude; they serve to emphasize by their very rarity. 
In the final critical chapter it appears to this reviewer that the author 
has done an adequate job of summing up, but has not pointed out the 
strengths and weaknesses of the four systems too clearly. The 
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organization of the book is clear-cut and consistent. The purpose was 

to investigate what current educational theories say about certain 

major problems of secondary education. ‘Another organization, with 

a section devoted to each of the systems and a final comparative 

critical chapter might have been better for emphasizing educational 

theory. C. M. Louttir. 
Indiana University. 


Murpo Mackenzigc. The Human Mind. Philadelphia: Blakiston, 
1941, pp. 204. 


This little book by Murdo Mackenzie, an English consulting 
physician in psychological medicine, is reading which one would like to 
prescribe for every troubled mind in these troubled times. There is 
scarcely any intelligent adult possessing even a modicum of apprecia- 
tion of his own weaknesses who could read these pages without some- 
where recognizing himself in them, and without gaining considerable 
insight into the faulty thinking which may be responsible for his 
difficulties. The author’s analysis of maladjustment in terms of 
disordered thinking is practical and sound. His illustrative cases are 
of an excellence rarely encountered in books of this sort; since they 
are seldom extreme, they clarify the discussion not only through their 
pertinency, but also through their resemblance to men and women we 
know. The occasional application of his ideas to the ‘‘ national mind,”’ 
and particularly his last chapter, written since the outbreak of war, on 
“Mind and the National Emergency,” are interesting and provocative. 
One wishes, however, that Mackenzie wrote more coherently and with 
less use of so specialized a terminology. These deficiencies moderate 
the enthusiasm with which one can prescribe the book for the layman. 

Underlying the author’s obvious aim, which is an exposition of the 
ways of the human mind and the suggestion of therapeutic approaches 
to disordered thinking, is a less superficially apparent, though fre- 
quently reiterated, purpose; namely, a plea for regarding the human 
mind as a separate organ suitable for special investigation—‘‘a 
discrete organ with a specific function, working in rhythm in space and 
time and driven by strictly mental forces.’’ The concept of mind as a 
discrete organ implies an entity and a dualism which are somewhat at 
variance with modern psychological emphasis upon the integrated 
organism. If, however, he is merely insisting that mental dysfunc- 
tion can, and usually should, be interpreted in terms of mental forces, 
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there will be few to disagree with him. The author’s convictions as to 
the nature of the working rhythm of the mind, one can only guess from 
his examples of its arrhythmia, although he does say that “the psy- 
chological standard is whether it [the organ of mind] is functioning 
satisfactorily, and not whether it approximates to some indefinable 
standard of a normal mind.” 

The first half of the book is devoted to an exposition and exemplifi- 
cation of the basic “‘mental forces,”’ the “‘attitudes of mind” which 
may result from their ‘‘displacement,’”’ and the “‘primary mental 
arrhythmias.”’ The exposition does not develop very logically, and 
lucidity is achieved more by virtue of the examples than the discussion. 
The basic mental forces are really what most modern psychologists 
would describe as fundamental mental attitudes, or habits of thinking 
with regard to people and events about us. The author believes each 
of them is an “innate bent” of the mind. They are two pairs of 
opposed forces: Immediacy and Deliberation, Simplification and 
Amplification. The Immediate ‘increases the intensity of the 
moment,’’ emphasizes the importance of the present, responds aggres- 
sively; the Deliberate ‘‘diminishes the intensity of the moment, 
thinks in terms of the past and the potential, responds pacifically.”’ 
These are the forces which drive the mind in terms of time. Simplifica- 
tion and Amplification, on the other hand, drive it in terms of space. 
The Simplifier thinks in terms of a unifying principle—contracts the 
facts into a unifying principle; the Amplifier thinks in terms of evidence 
—arranges the facts into a chain of evidence. The former is exempli- 
fied in Hegel, Newton, and Stanley Baldwin; the latter in Karl Marx, 
Einstein, and Lloyd George. Amplification has characterized the 
thinking of our era, but “‘the psychological pendulum is again on the 
swing, and Simplification and not Amplification is likely to be in 
the centre of the picture for the next twenty years or more.” 

The four ‘‘attitudes of mind’? which may develop from the dis- 
placement of these mental forces are Hysteria, Obsession, Depression, 
and Assertion. Obsession and Depression are reactions of Delibera- 
tion, but the Obsessive is an Amplifier, while the Depressive is a 
Simplifier. Hysterics and Assertives are Immediates, but the Hysteric 
is also driven by Amplification, and the Assertive by Simplification. 
Depression and Assertion the author believes to be the most frequent 
dysfunctions of thinking prevalent in the British Isles. 

The ‘‘primary mental arrhythmias,” disturbances of ‘regularity 
and rate of function,” are Anxiety-thinking and Apathy-thinking. 
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The former is an accelerated arrhythmia and is accompanied by a sense 
of super-awareness; the latter is a retarded arrhythmia accompanied 
by a sense of unreality, and is always a sequela of permanent Anxiety- 
thinking. The author’s discussion of the various forms of Anxiety- 
dictated behavior is excellent. 

The interrelationships of the ‘mental forces,’ the “‘arrhythmias,”’ 
and the ‘‘ attitudes of mind’’ may be summarized somewhat as follows: 
Alteration in the regularity and rate of mental functioning may be 
manifested in the primary arrhythmia of Anxiety-thinking. The 
individual’s specific Anxiety-reaction, however, will depend upon his 
innate attitude of mind (Hysteria, Obsession, Depression, Assertion). 
Which of these characterizes him is in turn dependent upon the dis- 
placed mental forces (Deliberation or Immediacy, Simplification or 
Amplification) by which he is driven. Thus, the Anxiety-dictated 
behavior of the Hysteric, driven by misplaced Immediacy and Amplifi- 
cation, will differ from that of the Anxiety-dictated behavior of 
the Depressive, driven by the displaced forces of Deliberation and 
Simplification. 

The second half of the book is concerned with the diagnosis, 
description, and treatment of Depression, the attitude of mind which 
the author feels to be most important and most neglected. Depression 
is characterized by a persisting sense of inadequacy and a pacific 
retreat. 


In Depression, the forces of Deliberation and Simplification drive the mind 
to contemplate its own inadequacy, rather than to comprehend and make 
contact with the world around—which is their proper function. When the 
forces are displaced, Deliberation ordains a pacific retreat; and Simplification, 
& unifying principle of inadequacy. It is this shift of attention from the wide 
dynamic world without, to the limited ard relatively static world within, that 
makes useful examination possible. 


The Depressive, himself a Deliberate, over-valuates the Immediates, 
“‘who can put things across and who revel in organization.” 


The problem arises in every occupation and profession, and always 
presents the same features—an absence of a sense of full achievement; a plea 
of inadequacy which justifies this; and an affirmation that the sufferer’s 
technical superiority is outshone by that of some rival who has the capacity 
‘to impress people.””’ The argument is advanced that, nowadays, technical 
ability is at a discount, and the fellow who matters is the organiser or the show- 
man who can put things across. 
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Does that not have a familiar ring to us all? The aim of therapy is to 
lead the Depressive to direct the forces of Deliberation and Simplifica- 
tion away from the contemplation of his own inadequacy and bring 
them to bear upon the problems and situations of life. He must be 
led to realize that Deliberacy and Simplification are essential and of 
great worth in the thought and activity of the world. 

The author discusses Depression as it may arise from, or be mani- 
fested in, the conflicts surrounding the achievement of the basic 
“human needs” of sex, subsistence, and religion. He discusses it in 
relation to the early and middle years of life. And, finally, he dis- 
cusses it most provocatively in relation to the National Mind. Delib- 
eration and Simplification are the innate forces which drive the British 
mind, and these have come into conflict during the past fifty years with 
the European forces of Immediacy and Amplification. One of the 
consequences of the clash is the apathy which has characterized 
British international behavior. 


If, then, industry and the contemporary intellectual trends are antithetic 
to the prevailing British mentality, it is reasonable to suggest that there has 
been a tyranny of Immediacy and Amplification over Deliberation and 
Simplification, and that this has induced some degree of national Apathy. 


Dictatorship, Nationalism, Nazi propaganda—all these are expressions 
of Immediacy and Amplification. British propaganda, on the other 
hand, is Deliberate and Simplified. 


Deliberate and simplified exposure of the Anxiety-dictated urge that com- 
pels dictators to remove their own great men—such as Von Fritsch and 
Balbo—is far more profitable British propaganda, than trying to out-threaten 
the threateners. 


The aim of Nazi terrorism is to produce fatigue-Anxiety and its 
sequela, Apathy. Thus far it has succeeded in creating only tempo- 
rary-Anxiety, which insists on increased mental effort and vanishes 


with the immediate danger. 


May the deliberate Simplification of this great country be released to its 
utmost, and its mental efficiency tuned to peak point; and the inefficiency of 
persisting Anxiety, the inertia of Apathy and, above all, the plea of inadequacy 
so much in evidence for the past fifty years, disappear for ever from its mental 
atmosphere at home and its corresponding influence abroad! 

CARLETON F. Scorie.p. 


University of Buffalo. 
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